<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="../../aosa.css" type="text/css">
    <title>500 Lines or Less: Dagoba: An In-Memory Graph Database</title>
  </head>
  <body>

    <div class="titlebox">
      <h1>500 Lines or Less<br>Dagoba: An In-Memory Graph Database</h1>
      <p class="author">Dann Toliver</p>
    </div>

    <p><em><a href="https://twitter.com/dann">Dann</a> enjoys building things, like programming languages, databases, distributed systems, communities of smart friendly humans, and pony castles with his two year old.</em></p>

<h2 id="prologue">Prologue</h2>

<blockquote>
<p>&quot;When we try to pick out anything by itself we find that it is bound fast by a thousand invisible cords that cannot be broken, to everything in the universe.&quot; —John Muir</p>
</blockquote>

<blockquote>
<p>&quot;What went forth to the ends of the world to traverse not itself, God, the sun, Shakespeare, a commercial traveller, having itself traversed in reality itself becomes that self.&quot; —James Joyce</p>
</blockquote>

<p>A long time ago, when the world was still young, all data walked happily in single file. If you wanted your data to jump over a fence, you just set the fence down in its path and each datum jumped it in turn. Punch cards in, punch cards out. Life was easy and programming was a breeze.</p>

<p>Then came the random access revolution, and data grazed freely across the hillside. Herding data became a serious concern: if you can access any piece of data at any time, how do you know which one to pick next? Techniques were developed for corralling the data by forming links between items<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a>, marshaling groups of units into formation through their linking assemblage. Questioning data meant picking a sheep and pulling along everything connected to it.</p>

<p>Later programmers departed from this tradition, imposing a set of rules on how data would be aggregated<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a>. Rather than tying disparate data directly together they would cluster by content, decomposing data into bite-sized pieces, collected in pens and collared with name tags. Questions were posed declaratively, resulting in accumulating pieces of partially decomposed data (a state the relationalists refer to as &quot;normal&quot;) into a frankencollection returned to the programmer.</p>

<p>For much of recorded history this relational model reigned supreme. Its dominance went unchallenged through two major language wars and countless skirmishes. It offered everything you could ask for in a model, for the small price of inefficiency, clumsiness and lack of scalability. For eons that was a price programmers were willing to pay. Then the internet happened.</p>

<p>The distributed revolution changed everything, again. Data broke free of spacial constraints and roamed from machine to machine. CAP-wielding theorists busted the relational monopoly, opening the door to new herding techniques—some of which hark back to the earliest attempts to domesticate random-access data. We're going to look at one of these, a style known as the graph database.</p>

<h2 id="take-one">Take One</h2>

<p>Within this chapter we're going to build a graph database<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a>. As we build it we're going to explore the problem space, generate multiple solutions for our design decisions, compare those solutions to understand the tradeoffs between them, and finally choose the right solution for our system. A higher-than-usual precedence is put on code compactness, but the process will otherwise mirror that used by software professionals since time immemorial. The purpose of this chapter is to teach this process. And to build a graph database<a href="#fn4" class="footnoteRef" id="fnref4"><sup>4</sup></a>.</p>

<p>Using a graph database will allow us to solve some interesting problems in an elegant fashion. Graphs are a very natural data structure for exploring connections between things. A graph in this sense is a set of vertices and a set of edges; in other words, it's a bunch of dots connected by lines. And a database? A &quot;data base&quot; is like a fort for data. You put data in it and get data back out of it.</p>

<p>So what kinds of problems can we solve with a graph database? Well, suppose that you enjoy tracking ancestral trees: parents, grandparents, cousins twice removed, that kind of thing. You'd like to develop a system that allows you to make natural and elegant queries like &quot;Who are Thor's second cousins once removed?&quot; or &quot;What is Freyja's connection to the Valkyries?&quot;</p>

<p>A reasonable schema for this data structure would be to have a table of entities and a table of relationships. A query for Thor's parents might look like</p>

<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">SELECT</span> e.* <span class="kw">FROM</span> entities <span class="kw">as</span> e, relationships <span class="kw">as</span> r
<span class="kw">WHERE</span> r.out = <span class="ot">&quot;Thor&quot;</span> <span class="kw">AND</span> r.type = <span class="ot">&quot;parent&quot;</span> <span class="kw">AND</span> r.in = e.id</code></pre>

<p>But how do we extend that to grandparents? We need to do a subquery, or use some other type of vendor-specific extension to SQL. And by the time we get to second cousins once removed we're going to have <em>a lot</em> of SQL.</p>

<p>What would we like to write? Something both concise and flexible; something that models our query in a natural way and extends to other queries like it. <code>second_cousins('Thor')</code> is concise, but it doesn't give us any flexibility. The SQL above is flexible, but lacks concision.</p>

<p>Something like <code>Thor.parents.parents.parents.children.children.children</code> strikes a reasonably good balance. The primitives give us flexibility to ask many similar questions, but the query is concise and natural. This particular phrasing gives us too many results, as it includes first cousins and siblings, but we're going for gestalt here.</p>

<p>What's the simplest thing we can build that gives us this kind of interface? We could make a list of vertices and a list of edges, just like the relational schema, and then build some helper functions. It might look something like this:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">V = [ <span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>, <span class="dv">4</span>, <span class="dv">5</span>, <span class="dv">6</span>, <span class="dv">7</span>, <span class="dv">8</span>, <span class="dv">9</span>, <span class="dv">10</span>, <span class="dv">11</span>, <span class="dv">12</span>, <span class="dv">13</span>, <span class="dv">14</span>, <span class="dv">15</span> ]
E = [ [<span class="dv">1</span>,<span class="dv">2</span>], [<span class="dv">1</span>,<span class="dv">3</span>],  [<span class="dv">2</span>,<span class="dv">4</span>],  [<span class="dv">2</span>,<span class="dv">5</span>],  [<span class="dv">3</span>,<span class="dv">6</span>],  [<span class="dv">3</span>,<span class="dv">7</span>],  [<span class="dv">4</span>,<span class="dv">8</span>]
    , [<span class="dv">4</span>,<span class="dv">9</span>], [<span class="dv">5</span>,<span class="dv">10</span>], [<span class="dv">5</span>,<span class="dv">11</span>], [<span class="dv">6</span>,<span class="dv">12</span>], [<span class="dv">6</span>,<span class="dv">13</span>], [<span class="dv">7</span>,<span class="dv">14</span>], [<span class="dv">7</span>,<span class="dv">15</span>] ]

parents = <span class="kw">function</span>(vertices) {
  <span class="kw">var</span> accumulator = []
  <span class="kw">for</span>(<span class="kw">var</span> i=<span class="dv">0</span>; i &lt; <span class="ot">E</span>.<span class="fu">length</span>; i++) {
    <span class="kw">var</span> edge = E[i]
    <span class="kw">if</span>(<span class="ot">vertices</span>.<span class="fu">indexOf</span>(edge[<span class="dv">1</span>]) !== -<span class="dv">1</span>)
      <span class="ot">accumulator</span>.<span class="fu">push</span>(edge[<span class="dv">0</span>])
  }
  <span class="kw">return</span> accumulator
}</code></pre>

<p>The essence of the above function is to iterate over a list, evaluating some code for each item and building up an accumulator of results. That's not quite as clear as it could be, though, because the looping construct introduces some unnecessary complexity.</p>

<p>It'd be nice if there was a more specific looping construct designed for this purpose. As it happens, the <code>reduce</code> function does exactly that: given a list and a function, it evaluates the function for each element of the list, while threading the accumulator through each evaluation pass.</p>

<p>Written in this more functional style our queries are shorter and clearer:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">parents  = (vertices) =&gt; <span class="ot">E</span>.<span class="fu">reduce</span>( (acc, [parent, child])
         =&gt; <span class="ot">vertices</span>.<span class="fu">includes</span>(child)  ? <span class="ot">acc</span>.<span class="fu">concat</span>(parent) : acc , [] )
children = (vertices) =&gt; <span class="ot">E</span>.<span class="fu">reduce</span>( (acc, [parent, child])
         =&gt; <span class="ot">vertices</span>.<span class="fu">includes</span>(parent) ? <span class="ot">acc</span>.<span class="fu">concat</span>(child)  : acc , [] )</code></pre>

<p>Given a list of vertices we reduce over the edges, adding an edge's parent to the accumulator if the edge's child is in our input list. The <code>children</code> function is identical, but examines the edge's parent to determine whether to add the edge's child.</p>

<p>Those functions are valid JavaScript, but use a few features which browsers haven't implemented as of this writing. This translated version will work today:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">parents  = <span class="kw">function</span>(x) { <span class="kw">return</span> <span class="ot">E</span>.<span class="fu">reduce</span>(
  <span class="kw">function</span>(acc, e) { <span class="kw">return</span> ~<span class="ot">x</span>.<span class="fu">indexOf</span>(e[<span class="dv">1</span>]) ? <span class="ot">acc</span>.<span class="fu">concat</span>(e[<span class="dv">0</span>]) : acc }, [] )}
children = <span class="kw">function</span>(x) { <span class="kw">return</span> <span class="ot">E</span>.<span class="fu">reduce</span>(
  <span class="kw">function</span>(acc, e) { <span class="kw">return</span> ~<span class="ot">x</span>.<span class="fu">indexOf</span>(e[<span class="dv">0</span>]) ? <span class="ot">acc</span>.<span class="fu">concat</span>(e[<span class="dv">1</span>]) : acc }, [] )}</code></pre>

<p>Now we can say something like:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">    <span class="fu">children</span>(<span class="fu">children</span>(<span class="fu">children</span>(<span class="fu">parents</span>(<span class="fu">parents</span>(<span class="fu">parents</span>([<span class="dv">8</span>]))))))</code></pre>

<p>It reads backwards and gets us lost in silly parens, but is otherwise pretty close to what we wanted. Take a minute to look at the code. Can you see any ways to improve it?</p>

<p>We're treating the edges as a global variable, which means we can only ever have one database at a time using these helper functions. That's pretty limiting.</p>

<p>We're also not using the vertices at all. What does that tell us? It implies that everything we need is in the edges array, which in this case is true: the vertex values are scalars, so they exist independently in the edges array. If we want to answer questions like &quot;What is Freyja's connection to the Valkyries?&quot; we'll need to add more data to the vertices, which means making them compound values, which means the edges array should reference vertices instead of copying their value.</p>

<p>The same holds true for our edges: they contain an &quot;in&quot; vertex and an &quot;out&quot; vertex<a href="#fn5" class="footnoteRef" id="fnref5"><sup>5</sup></a>, but no elegant way to incorporate additional information. We'll need that to answer questions like &quot;How many stepparents did Loki have?&quot; or &quot;How many children did Odin have before Thor was born?&quot;</p>

<p>You don't have to squint very hard to tell that the code for our two selectors looks very similar, which suggests there may be a deeper abstraction from which they spring.</p>

<p>Do you see any other issues?</p>

<h2 id="build-a-better-graph">Build a Better Graph</h2>

<p>Let's solve a few of the problems we've discovered. Having our vertices and edges be global constructs limits us to one graph at a time, but we'd like to have more. To solve this we'll need some structure. Let's start with a namespace.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">Dagoba = {}                                     <span class="co">// the namespace</span></code></pre>

<p>We'll use an object as our namespace. An object in JavaScript is mostly just an unordered set of key/value pairs. We only have four basic data structures to choose from in JavaScript, so we'll be using this one a lot. (A fun question to ask people at parties is &quot;What are the four basic data structures in JavaScript?&quot;)</p>

<p>Now we need some graphs. We can build these using a classic OOP pattern, but JavaScript offers us prototypal inheritance, which means we can build up a prototype object—we'll call it <code>Dagoba.G</code>—and then instantiate copies of that using a factory function. An advantage of this approach is that we can return different types of objects from the factory, instead of binding the creation process to a single class constructor. So we get some extra flexibility for free.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">G</span> = {}                                   <span class="co">// the prototype</span>

<span class="ot">Dagoba</span>.<span class="fu">graph</span> = <span class="kw">function</span>(V, E) {                 <span class="co">// the factory</span>
  <span class="kw">var</span> graph = <span class="ot">Object</span>.<span class="fu">create</span>( <span class="ot">Dagoba</span>.<span class="fu">G</span> )

  <span class="ot">graph</span>.<span class="fu">edges</span>       = []                        <span class="co">// fresh copies so they&#39;re not shared</span>
  <span class="ot">graph</span>.<span class="fu">vertices</span>    = []
  <span class="ot">graph</span>.<span class="fu">vertexIndex</span> = {}                        <span class="co">// a lookup optimization</span>

  <span class="ot">graph</span>.<span class="fu">autoid</span> = <span class="dv">1</span>                              <span class="co">// an auto-incrementing ID counter</span>

  <span class="kw">if</span>(<span class="ot">Array</span>.<span class="fu">isArray</span>(V)) <span class="ot">graph</span>.<span class="fu">addVertices</span>(V)     <span class="co">// arrays only, because you wouldn&#39;t</span>
  <span class="kw">if</span>(<span class="ot">Array</span>.<span class="fu">isArray</span>(E)) <span class="ot">graph</span>.<span class="fu">addEdges</span>(E)        <span class="co">//   call this with singular V and E</span>

  <span class="kw">return</span> graph
}</code></pre>

<p>We'll accept two optional arguments: a list of vertices and a list of edges. JavaScript is rather lax about parameters, so all named parameters are optional and default to <code>undefined</code> if not supplied<a href="#fn6" class="footnoteRef" id="fnref6"><sup>6</sup></a>. We will often have the vertices and edges before building the graph and use the V and E parameters, but it's also common to not have those at creation time and to build the graph up programmatically<a href="#fn7" class="footnoteRef" id="fnref7"><sup>7</sup></a>.</p>

<p>Then we create a new object that has all of our prototype's strengths and none of its weaknesses. We build a brand new array (one of the other basic JS data structures) for our edges, another for the vertices, a new object called <code>vertexIndex</code> and an ID counter—more on those latter two later. (Think: Why can't we just put these in the prototype?)</p>

<p>Then we call <code>addVertices</code> and <code>addEdges</code> from inside our factory, so let's define those now.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">addVertices</span> = <span class="kw">function</span>(vs) { <span class="ot">vs</span>.<span class="fu">forEach</span>(<span class="kw">this</span>.<span class="ot">addVertex</span>.<span class="fu">bind</span>(<span class="kw">this</span>)) }
<span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">addEdges</span>    = <span class="kw">function</span>(es) { <span class="ot">es</span>.<span class="fu">forEach</span>(<span class="kw">this</span>.<span class="ot">addEdge</span>  .<span class="fu">bind</span>(<span class="kw">this</span>)) }</code></pre>

<p>Okay, that was too easy—we're just passing off the work to <code>addVertex</code> and <code>addEdge</code>. We should define those now too.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">addVertex</span> = <span class="kw">function</span>(vertex) {         <span class="co">// accepts a vertex-like object</span>
  <span class="kw">if</span>(!<span class="ot">vertex</span>.<span class="fu">_id</span>)
    <span class="ot">vertex</span>.<span class="fu">_id</span> = <span class="kw">this</span>.<span class="fu">autoid</span>++
  <span class="kw">else</span> <span class="kw">if</span>(<span class="kw">this</span>.<span class="fu">findVertexById</span>(<span class="ot">vertex</span>.<span class="fu">_id</span>))
    <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">error</span>(<span class="st">&#39;A vertex with that ID already exists&#39;</span>)

  <span class="kw">this</span>.<span class="ot">vertices</span>.<span class="fu">push</span>(vertex)
  <span class="kw">this</span>.<span class="fu">vertexIndex</span>[<span class="ot">vertex</span>.<span class="fu">_id</span>] = vertex         <span class="co">// a fancy index thing</span>
  <span class="ot">vertex</span>.<span class="fu">_out</span> = []; <span class="ot">vertex</span>.<span class="fu">_in</span> = []             <span class="co">// placeholders for edge pointers</span>
  <span class="kw">return</span> <span class="ot">vertex</span>.<span class="fu">_id</span>
}</code></pre>

<p>If the vertex doesn't already have an <code>_id</code> property we assign it one using our autoid.<a href="#fn8" class="footnoteRef" id="fnref8"><sup>8</sup></a> If the <code>_id</code> already exists on a vertex in our graph then we reject the new vertex. Wait, when would that happen? And what exactly is a vertex?</p>

<p>In a traditional object-oriented system we would expect to find a vertex class, which all vertices would be an instance of. We're going to take a different approach and consider as a vertex any object containing the three properties <code>_id</code>, <code>_in</code> and <code>_out</code>. Why is that? Ultimately, it comes down to giving Dagoba control over which data is shared with the host application.</p>

<p>If we create some <code>Dagoba.Vertex</code> instance inside the <code>addVertex</code> function, our internal data will never be shared with the host application. If we accept a <code>Dagoba.Vertex</code> instance as the argument to our <code>addVertex</code> function, the host application could retain a pointer to that vertex object and manipulate it at runtime, breaking our invariants.</p>

<p>So if we create a vertex instance object, we're forced to decide up front whether we will always copy the provided data into a new object—potentially doubling our space usage—or allow the host application unfettered access to the database objects. There's a tension here between performance and protection, and the right balance depends on your specific use case.</p>

<p>Duck typing on the vertex's properties allows us to make that decision at run time, by either deep copying<a href="#fn9" class="footnoteRef" id="fnref9"><sup>9</sup></a> the incoming data or using it directly as a vertex<a href="#fn10" class="footnoteRef" id="fnref10"><sup>10</sup></a>. We don't always want to put the responsibility for balancing safety and performance in the hands of the user, but because these two sets of use cases diverge so widely the extra flexibility is important.</p>

<p>Now that we've got our new vertex we'll add it to our graph's list of vertices, add it to the <code>vertexIndex</code> for efficient lookup by <code>_id</code>, and add two additional properties to it: <code>_out</code> and <code>_in</code>, which will both become lists of edges<a href="#fn11" class="footnoteRef" id="fnref11"><sup>11</sup></a>.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">addEdge</span> = <span class="kw">function</span>(edge) {             <span class="co">// accepts an edge-like object</span>
  <span class="ot">edge</span>.<span class="fu">_in</span>  = <span class="kw">this</span>.<span class="fu">findVertexById</span>(<span class="ot">edge</span>.<span class="fu">_in</span>)
  <span class="ot">edge</span>.<span class="fu">_out</span> = <span class="kw">this</span>.<span class="fu">findVertexById</span>(<span class="ot">edge</span>.<span class="fu">_out</span>)

  <span class="kw">if</span>(!(<span class="ot">edge</span>.<span class="fu">_in</span> &amp;&amp; <span class="ot">edge</span>.<span class="fu">_out</span>))
    <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">error</span>(<span class="st">&quot;That edge&#39;s &quot;</span> + (<span class="ot">edge</span>.<span class="fu">_in</span> ? <span class="st">&#39;out&#39;</span> : <span class="st">&#39;in&#39;</span>)
                                       + <span class="st">&quot; vertex wasn&#39;t found&quot;</span>)

  <span class="ot">edge</span>.<span class="ot">_out</span>.<span class="ot">_out</span>.<span class="fu">push</span>(edge)                     <span class="co">// edge&#39;s out vertex&#39;s out edges</span>
  <span class="ot">edge</span>.<span class="ot">_in</span>.<span class="ot">_in</span>.<span class="fu">push</span>(edge)                       <span class="co">// vice versa</span>

  <span class="kw">this</span>.<span class="ot">edges</span>.<span class="fu">push</span>(edge)
}</code></pre>

<p>First we find both vertices which the edge connects, then reject the edge if it's missing either vertex. We'll use a helper function to log an error on rejection. All errors flow through this helper function, so we can override its behavior on a per-application basis. We could later extend this to allow <code>onError</code> handlers to be registered, so the host application could link in its own callbacks without overwriting the helper. We might allow such handlers to be registered per-graph, per-application, or both, depending on the level of flexibility required.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">error</span> = <span class="kw">function</span>(msg) {
  <span class="ot">console</span>.<span class="fu">log</span>(msg)
  <span class="kw">return</span> <span class="kw">false</span>
}</code></pre>

<p>Then we'll add our new edge to both vertices' edge lists: the edge's out vertex's list of out-side edges, and the in vertex's list of in-side edges.</p>

<p>And that's all the graph structure we need for now!</p>

<h2 id="enter-the-query">Enter the Query</h2>

<p>There are really only two parts to this system: the part that holds the graph and the part that answers questions about the graph. The part that holds the graph is pretty simple, as we've seen. The query part is a little trickier.</p>

<p>We'll start just like before, with a prototype and a query factory.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">Q</span> = {}

<span class="ot">Dagoba</span>.<span class="fu">query</span> = <span class="kw">function</span>(graph) {                <span class="co">// factory</span>
  <span class="kw">var</span> query = <span class="ot">Object</span>.<span class="fu">create</span>( <span class="ot">Dagoba</span>.<span class="fu">Q</span> )

  <span class="ot">query</span>.   <span class="fu">graph</span> = graph                        <span class="co">// the graph itself</span>
  <span class="ot">query</span>.   <span class="fu">state</span> = []                           <span class="co">// state for each step</span>
  <span class="ot">query</span>. <span class="fu">program</span> = []                           <span class="co">// list of steps to take</span>
  <span class="ot">query</span>.<span class="fu">gremlins</span> = []                           <span class="co">// gremlins for each step</span>

  <span class="kw">return</span> query
}</code></pre>

<p>Now's a good time to introduce some friends.</p>

<p>A <em>program</em> is a series of <em>steps</em>. Each step is like a pipe in a pipeline—a piece of data comes in one end, is transformed in some fashion, and goes out the other end. Our pipeline doesn't quite work like that, but it's a good first approximation.</p>

<p>Each step in our program can have <em>state</em>, and <code>query.state</code> is a list of per-step states that index correlates with the list of steps in <code>query.program</code>.</p>

<p>A <em>gremlin</em> is a creature that travels through the graph doing our bidding. A gremlin might be a surprising thing to find in a database, but they trace their heritage back to Tinkerpop's <a href="http://euranova.eu/upl_docs/publications/an-empirical-comparison-of-graph-databases.pdf">Blueprints</a>, and the <a href="http://edbt.org/Proceedings/2013-Genova/papers/workshops/a29-holzschuher.pdf">Gremlin and Pacer query languages</a>. They remember where they've been and allow us to find answers to interesting questions.</p>

<p>Remember that question we wanted to answer about Thor's second cousins once removed? We decided <code>Thor.parents.parents.parents.children.children.children</code> was a pretty good way of expressing that. Each <code>parents</code> or <code>children</code> instance is a step in our program. Each of those steps contains a reference to its <em>pipetype</em>, which is the function that performs that step's operation.</p>

<p>That query in our actual system might look like:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">    <span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">out</span>().<span class="fu">out</span>().<span class="fu">in</span>().<span class="fu">in</span>().<span class="fu">in</span>()</code></pre>

<p>Each of the steps is a function call, and so they can take <em>arguments</em>. The interpreter passes the step's arguments to the step's pipetype function, so in the query <code>g.v('Thor').out(2, 3)</code> the <code>out</code> pipetype function would receive <code>[2, 3]</code> as its first parameter.</p>

<p>We'll need a way to add steps to our query. Here's a helper function for that:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">Q</span>.<span class="fu">add</span> = <span class="kw">function</span>(pipetype, args) { <span class="co">// add a new step to the query</span>
  <span class="kw">var</span> step = [pipetype, args]
  <span class="kw">this</span>.<span class="ot">program</span>.<span class="fu">push</span>(step)                 <span class="co">// step is a pair of pipetype and its args</span>
  <span class="kw">return</span> <span class="kw">this</span>
}</code></pre>

<p>Each step is a composite entity, combining the pipetype function with the arguments to apply to that function. We could combine the two into a partially applied function at this stage, instead of using a tuple <a href="#fn12" class="footnoteRef" id="fnref12"><sup>12</sup></a>, but then we'd lose some introspective power that will prove helpful later.</p>

<p>We'll use a small set of query initializers that generate a new query from a graph. Here's one that starts most of our examples: the <code>v</code> method. It builds a new query, then uses our <code>add</code> helper to populate the initial query program. This makes use of the <code>vertex</code> pipetype, which we'll look at soon.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">v</span> = <span class="kw">function</span>() {                       <span class="co">// query initializer: g.v() -&gt; query</span>
  <span class="kw">var</span> query = <span class="ot">Dagoba</span>.<span class="fu">query</span>(<span class="kw">this</span>)
  <span class="ot">query</span>.<span class="fu">add</span>(<span class="st">&#39;vertex&#39;</span>, [].<span class="ot">slice</span>.<span class="fu">call</span>(arguments)) <span class="co">// add a step to our program</span>
  <span class="kw">return</span> query
}</code></pre>

<p>Note that <code>[].slice.call(arguments)</code> is JS parlance for &quot;please pass me an array of this function's arguments&quot;. You would be forgiven for supposing that <code>arguments</code> is already an array, since it behaves like one in many situations, but it is lacking much of the functionality we utilize in modern JavaScript arrays.</p>

<h2 id="the-problem-with-being-eager">The Problem with Being Eager</h2>

<p>Before we look at the pipetypes themselves we're going to take a diversion into the exciting world of execution strategy. There are two main schools of thought: the Call By Value clan, also known as eager beavers, are strict in their insistence that all arguments be evaluated before the function is applied. Their opposing faction, the Call By Needians, are content to procrastinate until the last possible moment before doing anything—they are, in a word, lazy.</p>

<p>JavaScript, being a strict language, will process each of our steps as they are called. We would then expect the evaluation of <code>g.v('Thor').out().in()</code> to first find the Thor vertex, then find all vertices connected to it by outgoing edges, and from each of those vertices finally return all vertices they are connected to by inbound edges.</p>

<p>In a non-strict language we would get the same result—the execution strategy doesn't make much difference here. But what if we added a few additional calls? Given how well-connected Thor is, our <code>g.v('Thor').out().out().out().in().in().in()</code> query may produce many results—in fact, because we're not limiting our vertex list to unique results, it may produce many more results than we have vertices in our total graph.</p>

<p>We're probably only interested in getting a few unique results out, so we'll change the query a bit: <code>g.v('Thor').out().out().out().in().in().in().unique().take(10)</code>. Now our query produces at most 10 results. What happens if we evaluate this eagerly, though? We're still going to have to build up septillions of results before returning only the first 10.</p>

<p>All graph databases have to support a mechanism for doing as little work as possible, and most choose some form of non-strict evaluation to do so. Since we're building our own interpreter, the lazy evaluation of our program is possible, but we may have to contend with some consequences.</p>

<h2 id="ramifications-of-evaluation-strategy-on-our-mental-model">Ramifications of Evaluation Strategy on our Mental Model</h2>

<p>Up until now our mental model for evaluation has been very simple:</p>

<ul>
<li>request a set of vertices</li>
<li>pass the returned set as input to a pipe</li>
<li>repeat as necessary</li>
</ul>

<p>We would like to retain that model for our users, because it's easier to reason about, but as we've seen we can no longer use that model for the implementation. Having users think in a model that differs from the actual implementation is a source of much pain. A leaky abstraction is a small-scale version of this; in the large it can lead to frustration, cognitive dissonance and ragequits.</p>

<p>Our case is nearly optimal for this deception, though: the answer to any query will be the same, regardless of execution model. The only difference is the performance. The tradeoff is between having all users learn a more complicated model prior to using the system, or forcing a subset of users to transfer from the simple model to the complicated model in order to better reason about query performance.</p>

<p>Some factors to consider when wrestling with this decision are:</p>

<ul>
<li>the relative cognitive difficulty of learning the simple model versus the more complex model;</li>
<li>the additional cognitive load imposed by first using the simple model and then advancing to the complex one versus skipping the simple and learning only the complex;</li>
<li>the subset of users required to make the transition, in terms of their proportional size, cognitive availability, available time, and so on.</li>
</ul>

<p>In our case this tradeoff makes sense. For most uses queries will return results fast enough that users needn't be concerned with optimizing their query structure or learning the deeper model. Those who will are the users writing advanced queries over large datasets, and they are also likely the users most well-equipped to transition to a new model. Additionally, our hope is that there is only a small increase in difficulty imposed by using the simple model before learning the more complex one.</p>

<p>We'll go into more detail on this new model soon, but in the meantime here are some highlights to keep in mind during the next section:</p>

<ul>
<li>Each pipe returns one result at a time, not a set of results. Each pipe may be activated many times while evaluating a query.</li>
<li>A read/write head controls which pipe is activated next. The head starts at the end of the pipeline, and its movement is directed by the result of the currently active pipe.</li>
<li>That result might be one of the aforementioned gremlins. Each gremlin represents a potential query result, and they carry state with them through the pipes. Gremlins cause the head to move to the right.</li>
<li>A pipe can return a result of 'pull', which signals the head that it needs input and moves it to the right.</li>
<li>A result of 'done' tells the head that nothing prior needs to be activated again, and moves the head left.</li>
</ul>

<h2 id="pipetypes">Pipetypes</h2>

<p>Pipetypes make up the core of our system. Once we understand how each one works, we'll have a better basis for understanding how they're invoked and sequenced together in the interpreter.</p>

<p>We'll start by making a place to put our pipetypes, and a way to add new ones.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">Pipetypes</span> = {}

<span class="ot">Dagoba</span>.<span class="fu">addPipetype</span> = <span class="kw">function</span>(name, fun) {              <span class="co">// adds a chainable method</span>
  <span class="ot">Dagoba</span>.<span class="fu">Pipetypes</span>[name] = fun
  <span class="ot">Dagoba</span>.<span class="fu">Q</span>[name] = <span class="kw">function</span>() {
    <span class="kw">return</span> <span class="kw">this</span>.<span class="fu">add</span>(name, [].<span class="ot">slice</span>.<span class="fu">apply</span>(arguments)) }  <span class="co">// capture pipetype and args</span>
}</code></pre>

<p>The pipetype's function is added to the list of pipetypes, and then a new method is added to the query object. Every pipetype must have a corresponding query method. That method adds a new step to the query program, along with its arguments.</p>

<p>When we evaluate <code>g.v('Thor').out('parent').in('parent')</code> the <code>v</code> call returns a query object, the <code>out</code> call adds a new step and returns the query object, and the <code>in</code> call does the same. This is what enables our method-chaining API.</p>

<p>Note that adding a new pipetype with the same name replaces the existing one, which allows runtime modification of existing pipetypes. What's the cost of this decision? What are the alternatives?</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">getPipetype</span> = <span class="kw">function</span>(name) {
  <span class="kw">var</span> pipetype = <span class="ot">Dagoba</span>.<span class="fu">Pipetypes</span>[name]                 <span class="co">// a pipetype is a function</span>

  <span class="kw">if</span>(!pipetype)
    <span class="ot">Dagoba</span>.<span class="fu">error</span>(<span class="st">&#39;Unrecognized pipetype: &#39;</span> + name)

  <span class="kw">return</span> pipetype || <span class="ot">Dagoba</span>.<span class="fu">fauxPipetype</span>
}</code></pre>

<p>If we can't find a pipetype, we generate an error and return the default pipetype, which acts like an empty conduit: if a message comes in one side, it gets passed out the other.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">fauxPipetype</span> = <span class="kw">function</span>(_, _, maybe_gremlin) {   <span class="co">// pass the result upstream</span>
  <span class="kw">return</span> maybe_gremlin || <span class="st">&#39;pull&#39;</span>                        <span class="co">// or send a pull downstream</span>
}</code></pre>

<p>See those underscores? We use those to label params that won't be used in our function. Most other pipetypes will use all three parameters, and have all three parameter names. This allows us to distinguish at a glance which parameters a particular pipetype relies on.</p>

<p>This underscore technique is also important because it makes the comments line up nicely. No, seriously. If programs <a href="https://mitpress.mit.edu/sicp/front/node3.html">&quot;must be written for people to read, and only incidentally for machines to execute&quot;</a>, then it immediately follows that our predominant concern should be making code pretty.</p>

<h4 id="vertex">Vertex</h4>

<p>Most pipetypes we meet will take a gremlin and produce more gremlins, but this particular pipetype generates gremlins from just a string. Given an vertex ID it returns a single new gremlin. Given a query it will find all matching vertices, and yield one new gremlin at a time until it has worked through them.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;vertex&#39;</span>, <span class="kw">function</span>(graph, args, gremlin, state) {
  <span class="kw">if</span>(!<span class="ot">state</span>.<span class="fu">vertices</span>)
    <span class="ot">state</span>.<span class="fu">vertices</span> = <span class="ot">graph</span>.<span class="fu">findVertices</span>(args)       <span class="co">// state initialization</span>

  <span class="kw">if</span>(!<span class="ot">state</span>.<span class="ot">vertices</span>.<span class="fu">length</span>)                        <span class="co">// all done</span>
    <span class="kw">return</span> <span class="st">&#39;done&#39;</span>

  <span class="kw">var</span> vertex = <span class="ot">state</span>.<span class="ot">vertices</span>.<span class="fu">pop</span>()                 <span class="co">// OPT: requires vertex cloning</span>
  <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">makeGremlin</span>(vertex, <span class="ot">gremlin</span>.<span class="fu">state</span>)  <span class="co">// gremlins from as/back queries</span>
})</code></pre>

<p>We first check to see if we've already gathered matching vertices, otherwise we try to find some. If there are any vertices, we'll pop one off and return a new gremlin sitting on that vertex. Each gremlin can carry around its own state, like a journal of where it's been and what interesting things it has seen on its journey through the graph. If we receive a gremlin as input to this step we'll copy its journal for the exiting gremlin.</p>

<p>Note that we're directly mutating the state argument here, and not passing it back. An alternative would be to return an object instead of a gremlin or signal, and pass state back that way. That complicates our return value, and creates some additional garbage <a href="#fn13" class="footnoteRef" id="fnref13"><sup>13</sup></a>. If JS allowed multiple return values it would make this option more elegant.</p>

<p>We would still need to find a way to deal with the mutations, though, as the call site maintains a reference to the original variable. What if we had some way to determine whether a particular reference is &quot;unique&quot;—that it is the only reference to that object?</p>

<p>If we know a reference is unique then we can get the benefits of immutability while avoiding expensive copy-on-write schemes or complicated persistent data structures. With only one reference we can't tell whether the object has been mutated or a new object has been returned with the changes we requested: &quot;observed immutability&quot; is maintained <a href="#fn14" class="footnoteRef" id="fnref14"><sup>14</sup></a>.</p>

<p>There are a couple of common ways of determining this: in a statically typed system we might make use of uniqueness types <a href="#fn15" class="footnoteRef" id="fnref15"><sup>15</sup></a> to guarantee at compile time that each object has only one reference. If we had a reference counter <a href="#fn16" class="footnoteRef" id="fnref16"><sup>16</sup></a>—even just a cheap two-bit sticky counter—we could know at runtime that an object only has one reference and use that knowledge to our advantage.</p>

<p>JavaScript doesn't have either of these facilities, but we can get almost the same effect if we're really, really disciplined. Which we will be. For now.</p>

<h4 id="in-n-out">In-N-Out</h4>

<p>Walking the graph is as easy as ordering a burger. These two lines set up the <code>in</code> and <code>out</code> pipetypes for us.</p>

<p></p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;out&#39;</span>, <span class="ot">Dagoba</span>.<span class="fu">simpleTraversal</span>(<span class="st">&#39;out&#39;</span>))
<span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;in&#39;</span>,  <span class="ot">Dagoba</span>.<span class="fu">simpleTraversal</span>(<span class="st">&#39;in&#39;</span>))</code></pre>

<p>The <code>simpleTraversal</code> function returns a pipetype handler that accepts a gremlin as its input, and spawns a new gremlin each time it's queried. Once those gremlins are gone, it sends back a 'pull' request to get a new gremlin from its predecessor.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">simpleTraversal</span> = <span class="kw">function</span>(dir) {
  <span class="kw">var</span> find_method = dir == <span class="st">&#39;out&#39;</span> ? <span class="st">&#39;findOutEdges&#39;</span> : <span class="st">&#39;findInEdges&#39;</span>
  <span class="kw">var</span> edge_list   = dir == <span class="st">&#39;out&#39;</span> ? <span class="st">&#39;_in&#39;</span> : <span class="st">&#39;_out&#39;</span>

  <span class="kw">return</span> <span class="kw">function</span>(graph, args, gremlin, state) {
    <span class="kw">if</span>(!gremlin &amp;&amp; (!<span class="ot">state</span>.<span class="fu">edges</span> || !<span class="ot">state</span>.<span class="ot">edges</span>.<span class="fu">length</span>))     <span class="co">// query initialization</span>
      <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>

    <span class="kw">if</span>(!<span class="ot">state</span>.<span class="fu">edges</span> || !<span class="ot">state</span>.<span class="ot">edges</span>.<span class="fu">length</span>) {                 <span class="co">// state initialization</span>
      <span class="ot">state</span>.<span class="fu">gremlin</span> = gremlin
      <span class="ot">state</span>.<span class="fu">edges</span> = graph[find_method](<span class="ot">gremlin</span>.<span class="fu">vertex</span>)        <span class="co">// get matching edges</span>
                         .<span class="fu">filter</span>(<span class="ot">Dagoba</span>.<span class="fu">filterEdges</span>(args[<span class="dv">0</span>]))
    }

    <span class="kw">if</span>(!<span class="ot">state</span>.<span class="ot">edges</span>.<span class="fu">length</span>)                                   <span class="co">// nothing more to do</span>
      <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>

    <span class="kw">var</span> vertex = <span class="ot">state</span>.<span class="ot">edges</span>.<span class="fu">pop</span>()[edge_list]                 <span class="co">// use up an edge</span>
    <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">gotoVertex</span>(<span class="ot">state</span>.<span class="fu">gremlin</span>, vertex)
  }
}</code></pre>

<p>The first couple of lines handle the differences between the in version and the out version. Then we're ready to return our pipetype function, which looks quite a bit like the vertex pipetype we just saw. That's a little surprising, since this one takes in a gremlin whereas the vertex pipetype creates gremlins <em>ex nihilo</em>.</p>

<p>Yet we can see the same beats being hit here, with the addition of a query initialization step. If there's no gremlin and we're out of available edges then we pull. If we have a gremlin but haven't yet set state then we find any edges going the appropriate direction and add them to our state. If there's a gremlin but its current vertex has no appropriate edges then we pull. And finally we pop off an edge and return a freshly cloned gremlin on the vertex to which it points.</p>

<p>Glancing at this code we see <code>!state.edges.length</code> repeated in each of the three clauses. It's tempting to refactor this to reduce the complexity of those conditionals. There are two issues keeping us from doing so.</p>

<p>One is relatively minor: the third <code>!state.edges.length</code> means something different from the first two, since <code>state.edges</code> has been changed between the second and third conditional. This actually encourages us to refactor, because having the same label mean two different things inside a single function usually isn't ideal.</p>

<p>The second is more serious. This isn't the only pipetype function we're writing, and we'll see these ideas of query initialization and/or state initialization repeated over and over. When writing code, there's always a balancing act between structured qualities and unstructured qualities. Too much structure and you pay a high cost in boilerplate and abstraction complexity. Too little structure and you'll have to keep all the plumbing minutia in your head.</p>

<p>In this case, with a dozen or so pipetypes, the right choice seems to be to style each of the pipetype functions as similarly as possible, and label the constituent pieces with comments. So we resist our impulse to refactor this particular pipetype, because doing so would reduce uniformity, but we also resist the urge to engineer a formal structural abstraction for query initialization, state initialization, and the like. If there were hundreds of pipetypes that latter choice would probably be the right one: the complexity cost of the abstraction is constant, while the benefit accrues linearly with the number of units. When handling that many moving pieces, anything you can do to enforce regularity among them is helpful.</p>

<h4 id="property">Property</h4>

<p>Let's pause for a moment to consider an example query based on the three pipetypes we've seen. We can ask for Thor's grandparents like this<a href="#fn17" class="footnoteRef" id="fnref17"><sup>17</sup></a>:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>(<span class="st">&#39;parent&#39;</span>).<span class="fu">out</span>(<span class="st">&#39;parent&#39;</span>).<span class="fu">run</span>()</code></pre>

<p>But what if we wanted their names? We could put a map on the end of that:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>(<span class="st">&#39;parent&#39;</span>).<span class="fu">out</span>(<span class="st">&#39;parent&#39;</span>).<span class="fu">run</span>()
 .<span class="fu">map</span>(<span class="kw">function</span>(vertex) {<span class="kw">return</span> <span class="ot">vertex</span>.<span class="fu">name</span>})</code></pre>

<p>But this is a common enough operation that we'd prefer to write something more like:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>(<span class="st">&#39;parent&#39;</span>).<span class="fu">out</span>(<span class="st">&#39;parent&#39;</span>).<span class="fu">property</span>(<span class="st">&#39;name&#39;</span>).<span class="fu">run</span>()</code></pre>

<p>Plus this way the property pipe is an integral part of the query, instead of something appended after. This has some interesting benefits, as we'll soon see.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;property&#39;</span>, <span class="kw">function</span>(graph, args, gremlin, state) {
  <span class="kw">if</span>(!gremlin) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>                                  <span class="co">// query initialization</span>
  <span class="ot">gremlin</span>.<span class="fu">result</span> = <span class="ot">gremlin</span>.<span class="fu">vertex</span>[args[<span class="dv">0</span>]]
  <span class="kw">return</span> <span class="ot">gremlin</span>.<span class="fu">result</span> == <span class="kw">null</span> ? <span class="kw">false</span> : gremlin             <span class="co">// false for bad props</span>
})</code></pre>

<p>Our query initialization here is trivial: if there's no gremlin, we pull. If there is a gremlin, we'll set its result to the property's value. Then the gremlin can continue onward. If it makes it through the last pipe its result will be collected and returned from the query. Not all gremlins have a <code>result</code> property. Those that don't return their most recently visited vertex.</p>

<p>Note that if the property doesn't exist we return <code>false</code> instead of the gremlin, so property pipes also act as a type of filter. Can you think of a use for this? What are the tradeoffs in this design decision?</p>

<h4 id="unique">Unique</h4>

<p>If we want to collect all Thor's grandparents' grandchildren—his cousins, his siblings, and himself—we could do a query like this: <code>g.v('Thor').in().in().out().out().run()</code>. That would give us many duplicates, however. In fact there would be at least four copies of Thor himself. (Can you think of a time when there might be more?)</p>

<p>To resolve this we introduce a new pipetype called 'unique'. Our new query produces output in one-to-one correspondence with the grandchildren:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">    <span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">in</span>().<span class="fu">in</span>().<span class="fu">out</span>().<span class="fu">out</span>().<span class="fu">unique</span>().<span class="fu">run</span>()</code></pre>

<p>The pipetype implementation:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;unique&#39;</span>, <span class="kw">function</span>(graph, args, gremlin, state) {
  <span class="kw">if</span>(!gremlin) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>                                  <span class="co">// query initialization</span>
  <span class="kw">if</span>(state[<span class="ot">gremlin</span>.<span class="ot">vertex</span>.<span class="fu">_id</span>]) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>                 <span class="co">// reject repeats</span>
  state[<span class="ot">gremlin</span>.<span class="ot">vertex</span>.<span class="fu">_id</span>] = <span class="kw">true</span>
  <span class="kw">return</span> gremlin
})</code></pre>

<p>A unique pipe is purely a filter: it either passes the gremlin through unchanged or it tries to pull a new gremlin from the previous pipe.</p>

<p>We initialize by trying to collect a gremlin. If the gremlin's current vertex is in our cache, then we've seen it before so we try to collect a new one. Otherwise, we add the gremlin's current vertex to our cache and pass it along. Easy peasy.</p>

<h4 id="filter">Filter</h4>

<p>We've seen two simplistic ways of filtering, but sometimes we need more elaborate constraints. What if we want to find all of Thor's siblings whose weight is greater than their height <a href="#fn18" class="footnoteRef" id="fnref18"><sup>18</sup></a>? This query would give us our answer:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">in</span>().<span class="fu">unique</span>()
 .<span class="fu">filter</span>(<span class="kw">function</span>(asgardian) { <span class="kw">return</span> <span class="ot">asgardian</span>.<span class="fu">weight</span> &gt; <span class="ot">asgardian</span>.<span class="fu">height</span> })
 .<span class="fu">run</span>()</code></pre>

<p>If we want to know which of Thor's siblings survive Ragnarök we can pass filter an object:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">in</span>().<span class="fu">unique</span>().<span class="fu">filter</span>({<span class="dt">survives</span>: <span class="kw">true</span>}).<span class="fu">run</span>()</code></pre>

<p>Here's how it works:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;filter&#39;</span>, <span class="kw">function</span>(graph, args, gremlin, state) {
  <span class="kw">if</span>(!gremlin) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>                                  <span class="co">// query initialization</span>

  <span class="kw">if</span>(<span class="kw">typeof</span> args[<span class="dv">0</span>] == <span class="st">&#39;object&#39;</span>)                              <span class="co">// filter by object</span>
    <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">objectFilter</span>(<span class="ot">gremlin</span>.<span class="fu">vertex</span>, args[<span class="dv">0</span>])
         ? gremlin : <span class="st">&#39;pull&#39;</span>

  <span class="kw">if</span>(<span class="kw">typeof</span> args[<span class="dv">0</span>] != <span class="st">&#39;function&#39;</span>) {
    <span class="ot">Dagoba</span>.<span class="fu">error</span>(<span class="st">&#39;Filter is not a function: &#39;</span> + args[<span class="dv">0</span>])
    <span class="kw">return</span> gremlin                                            <span class="co">// keep things moving</span>
  }

  <span class="kw">if</span>(!args[<span class="dv">0</span>](<span class="ot">gremlin</span>.<span class="fu">vertex</span>, gremlin)) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>         <span class="co">// gremlin fails filter</span>
  <span class="kw">return</span> gremlin
})</code></pre>

<p>If the filter's first argument is not an object or function then we trigger an error, and pass the gremlin along. Pause for a minute, and consider the alternatives. Why would we decide to continue the query once an error is encountered?</p>

<p>There are two reasons this error might arise. The first involves a programmer typing in a query, either in a REPL or directly in code. When run, that query will produce results, and also generate a programmer-observable error. The programmer then corrects the error to further filter the set of results produced. Alternatively, the system could display only the error and produce no results, and fixing all errors would allow results to be displayed.</p>

<p>The second possibility is that the filter is being applied dynamically at run time. This is a much more important case, because the person invoking the query is not necessarily the author of the query code. Because this is on the web, our default rule is to always show results, and to never break things. It is usually preferable to soldier on in the face of trouble rather than succumb to our wounds and present the user with a grisly error message.</p>

<p>For those occasions when showing too few results is better than showing too many, <code>Dagoba.error</code> can be overridden to throw an error, thereby circumventing the natural control flow.</p>

<h4 id="take">Take</h4>

<p>We don't always want all the results at once. Sometimes we only need a handful of results; say we want a dozen of Thor's contemporaries, so we walk all the way back to the primeval cow Auðumbla:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">out</span>().<span class="fu">out</span>().<span class="fu">out</span>().<span class="fu">in</span>().<span class="fu">in</span>().<span class="fu">in</span>().<span class="fu">in</span>().<span class="fu">unique</span>().<span class="fu">take</span>(<span class="dv">12</span>).<span class="fu">run</span>()</code></pre>

<p>Without the <code>take</code> pipe that query could take quite a while to run, but thanks to our lazy evaluation strategy the query with the <code>take</code> pipe is very efficient.</p>

<p>Sometimes we just want one at a time: we'll process the result, work with it, and then come back for another one. This pipetype allows us to do that as well.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">q = <span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Auðumbla&#39;</span>).<span class="fu">in</span>().<span class="fu">in</span>().<span class="fu">in</span>().<span class="fu">property</span>(<span class="st">&#39;name&#39;</span>).<span class="fu">take</span>(<span class="dv">1</span>)

<span class="ot">q</span>.<span class="fu">run</span>() <span class="co">// [&#39;Odin&#39;]</span>
<span class="ot">q</span>.<span class="fu">run</span>() <span class="co">// [&#39;Vili&#39;]</span>
<span class="ot">q</span>.<span class="fu">run</span>() <span class="co">// [&#39;Vé&#39;]</span>
<span class="ot">q</span>.<span class="fu">run</span>() <span class="co">// []</span></code></pre>

<p>Our query can function in an asynchronous environment, allowing us to collect more results as needed. When we run out, an empty array is returned.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;take&#39;</span>, <span class="kw">function</span>(graph, args, gremlin, state) {
  <span class="ot">state</span>.<span class="fu">taken</span> = <span class="ot">state</span>.<span class="fu">taken</span> || <span class="dv">0</span>                              <span class="co">// state initialization</span>

  <span class="kw">if</span>(<span class="ot">state</span>.<span class="fu">taken</span> == args[<span class="dv">0</span>]) {
    <span class="ot">state</span>.<span class="fu">taken</span> = <span class="dv">0</span>
    <span class="kw">return</span> <span class="st">&#39;done&#39;</span>                                             <span class="co">// all done</span>
  }

  <span class="kw">if</span>(!gremlin) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>                                  <span class="co">// query initialization</span>
  <span class="ot">state</span>.<span class="fu">taken</span>++
  <span class="kw">return</span> gremlin
})</code></pre>

<p>We initialize <code>state.taken</code> to zero if it doesn't already exist. JavaScript has implicit coercion, but coerces <code>undefined</code> into <code>NaN</code>, so we have to be explicit here <a href="#fn19" class="footnoteRef" id="fnref19"><sup>19</sup></a>.</p>

<p>Then when <code>state.taken</code> reaches <code>args[0]</code> we return 'done', sealing off the pipes before us. We also reset the <code>state.taken</code> counter, allowing us to repeat the query later.</p>

<p>We do those two steps before query initialization to handle the cases of <code>take(0)</code> and <code>take()</code> <a href="#fn20" class="footnoteRef" id="fnref20"><sup>20</sup></a>. Then we increment our counter and return the gremlin.</p>

<h4 id="as">As</h4>

<p>These next four pipetypes work as a group to allow more advanced queries. This one just allows you to label the current vertex. We'll use that label with the next two pipetypes.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;as&#39;</span>, <span class="kw">function</span>(graph, args, gremlin, state) {
  <span class="kw">if</span>(!gremlin) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>                                  <span class="co">// query initialization</span>
  <span class="ot">gremlin</span>.<span class="ot">state</span>.<span class="fu">as</span> = <span class="ot">gremlin</span>.<span class="ot">state</span>.<span class="fu">as</span> || {}                   <span class="co">// init the &#39;as&#39; state</span>
  <span class="ot">gremlin</span>.<span class="ot">state</span>.<span class="fu">as</span>[args[<span class="dv">0</span>]] = <span class="ot">gremlin</span>.<span class="fu">vertex</span>                  <span class="co">// set label to vertex</span>
  <span class="kw">return</span> gremlin
})</code></pre>

<p>After initializing the query, we ensure the gremlin's local state has an <code>as</code> parameter. Then we set a property of that parameter to the gremlin's current vertex.</p>

<h4 id="merge">Merge</h4>

<p>Once we've labeled vertices we can extract them using merge. If we want Thor's parents, grandparents and great-grandparents we can do something like this:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">as</span>(<span class="st">&#39;parent&#39;</span>).<span class="fu">out</span>().<span class="fu">as</span>(<span class="st">&#39;grandparent&#39;</span>).<span class="fu">out</span>().<span class="fu">as</span>(<span class="st">&#39;great-grandparent&#39;</span>)
           .<span class="fu">merge</span>(<span class="st">&#39;parent&#39;</span>, <span class="st">&#39;grandparent&#39;</span>, <span class="st">&#39;great-grandparent&#39;</span>).<span class="fu">run</span>()</code></pre>

<p>Here's the merge pipetype:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;merge&#39;</span>, <span class="kw">function</span>(graph, args, gremlin, state) {
  <span class="kw">if</span>(!<span class="ot">state</span>.<span class="fu">vertices</span> &amp;&amp; !gremlin) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>               <span class="co">// query initialization</span>

  <span class="kw">if</span>(!<span class="ot">state</span>.<span class="fu">vertices</span> || !<span class="ot">state</span>.<span class="ot">vertices</span>.<span class="fu">length</span>) {             <span class="co">// state initialization</span>
    <span class="kw">var</span> obj = (<span class="ot">gremlin</span>.<span class="fu">state</span>||{}).<span class="fu">as</span> || {}
    <span class="ot">state</span>.<span class="fu">vertices</span> = <span class="ot">args</span>.<span class="fu">map</span>(<span class="kw">function</span>(id) {<span class="kw">return</span> obj[id]}).<span class="fu">filter</span>(Boolean)
  }

  <span class="kw">if</span>(!<span class="ot">state</span>.<span class="ot">vertices</span>.<span class="fu">length</span>) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>                    <span class="co">// done with this batch</span>

  <span class="kw">var</span> vertex = <span class="ot">state</span>.<span class="ot">vertices</span>.<span class="fu">pop</span>()
  <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">makeGremlin</span>(vertex, <span class="ot">gremlin</span>.<span class="fu">state</span>)
})</code></pre>

<p>We map over each argument, looking for it in the gremlin's list of labeled vertices. If we find it, we clone the gremlin to that vertex. Note that only gremlins that make it to this pipe are included in the merge—if Thor's mother's parents aren't in the graph, she won't be in the result set.</p>

<h4 id="except">Except</h4>

<p>We've already seen cases where we would like to say &quot;Give me all Thor's siblings who are not Thor&quot;. We can do that with a filter:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">in</span>().<span class="fu">unique</span>()
           .<span class="fu">filter</span>(<span class="kw">function</span>(asgardian) {<span class="kw">return</span> <span class="ot">asgardian</span>.<span class="fu">_id</span> != <span class="st">&#39;Thor&#39;</span>}).<span class="fu">run</span>()</code></pre>

<p>It's more straightforward with <code>as</code> and <code>except</code>:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">as</span>(<span class="st">&#39;me&#39;</span>).<span class="fu">out</span>().<span class="fu">in</span>().<span class="fu">except</span>(<span class="st">&#39;me&#39;</span>).<span class="fu">unique</span>().<span class="fu">run</span>()</code></pre>

<p>But there are also queries that would be difficult to try to filter. What if we wanted Thor's uncles and aunts? How would we filter out his parents? It's easy with <code>as</code> and <code>except</code> <a href="#fn21" class="footnoteRef" id="fnref21"><sup>21</sup></a>:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">as</span>(<span class="st">&#39;parent&#39;</span>).<span class="fu">out</span>().<span class="fu">in</span>().<span class="fu">except</span>(<span class="st">&#39;parent&#39;</span>).<span class="fu">unique</span>().<span class="fu">run</span>()</code></pre>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;except&#39;</span>, <span class="kw">function</span>(graph, args, gremlin, state) {
  <span class="kw">if</span>(!gremlin) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>                                  <span class="co">// query initialization</span>
  <span class="kw">if</span>(<span class="ot">gremlin</span>.<span class="fu">vertex</span> == <span class="ot">gremlin</span>.<span class="ot">state</span>.<span class="fu">as</span>[args[<span class="dv">0</span>]]) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>
  <span class="kw">return</span> gremlin
})</code></pre>

<p>Here we're checking whether the current vertex is equal to the one we stored previously. If it is, we skip it.</p>

<h4 id="back">Back</h4>

<p>Some of the questions we might ask involve checking further into the graph, only to return later to our point of origin if the answer is in the affirmative. Suppose we wanted to know which of Fjörgynn's daughters had children with one of Bestla's sons?</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Fjörgynn&#39;</span>).<span class="fu">in</span>().<span class="fu">as</span>(<span class="st">&#39;me&#39;</span>)       <span class="co">// first gremlin&#39;s state.as is Frigg</span>
 .<span class="fu">in</span>()                              <span class="co">// first gremlin&#39;s vertex is now Baldr</span>
 .<span class="fu">out</span>().<span class="fu">out</span>()                       <span class="co">// clone that gremlin for each grandparent</span>
 .<span class="fu">filter</span>({<span class="dt">_id</span>: <span class="st">&#39;Bestla&#39;</span>})           <span class="co">// keep only the gremlin on grandparent Bestla</span>
 .<span class="fu">back</span>(<span class="st">&#39;me&#39;</span>).<span class="fu">unique</span>().<span class="fu">run</span>()         <span class="co">// jump gremlin&#39;s vertex back to Frigg and exit</span></code></pre>

<p>Here's the definition for <code>back</code>:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(<span class="st">&#39;back&#39;</span>, <span class="kw">function</span>(graph, args, gremlin, state) {
  <span class="kw">if</span>(!gremlin) <span class="kw">return</span> <span class="st">&#39;pull&#39;</span>                                  <span class="co">// query initialization</span>
  <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">gotoVertex</span>(gremlin, <span class="ot">gremlin</span>.<span class="ot">state</span>.<span class="fu">as</span>[args[<span class="dv">0</span>]])
})</code></pre>

<p>We're using the <code>Dagoba.gotoVertex</code> helper function to do all real work here. Let's take a look at that and some other helpers now.</p>

<h2 id="helpers">Helpers</h2>

<p>The pipetypes above rely on a few helpers to do their jobs. Let's take a quick look at those before diving in to the interpreter.</p>

<h4 id="gremlins">Gremlins</h4>

<p>Gremlins are simple creatures: they have a current vertex, and some local state. So to make a new one we just need to make an object with those two things.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">makeGremlin</span> = <span class="kw">function</span>(vertex, state) {
  <span class="kw">return</span> {<span class="dt">vertex</span>: vertex, <span class="dt">state</span>: state || {} }
}</code></pre>

<p>Any object that has a vertex property and a state property is a gremlin by this definition, so we could just inline the constructor, but wrapping it in a function allows us to add new properties to all gremlins in a single place.</p>

<p>We can also take an existing gremlin and send it to a new vertex, as we saw in the <code>back</code> pipetype and the <code>simpleTraversal</code> function.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">gotoVertex</span> = <span class="kw">function</span>(gremlin, vertex) {               <span class="co">// clone the gremlin</span>
  <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">makeGremlin</span>(vertex, <span class="ot">gremlin</span>.<span class="fu">state</span>)
}</code></pre>

<p>Note that this function actually returns a brand new gremlin: a clone of the old one, sent to our desired destination. That means a gremlin can sit on a vertex while its clones are sent out to explore many other vertices. This is exactly what happens in <code>simpleTraversal</code>.</p>

<p>As an example of possible enhancements, we could add a bit of state to keep track of every vertex the gremlin visits, and add new pipetypes to take advantage of those paths.</p>

<h4 id="finding">Finding</h4>

<p>The <code>vertex</code> pipetype uses the <code>findVertices</code> function to collect a set of initial vertices from which to begin our query.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">findVertices</span> = <span class="kw">function</span>(args) {                      <span class="co">// vertex finder helper</span>
  <span class="kw">if</span>(<span class="kw">typeof</span> args[<span class="dv">0</span>] == <span class="st">&#39;object&#39;</span>)
    <span class="kw">return</span> <span class="kw">this</span>.<span class="fu">searchVertices</span>(args[<span class="dv">0</span>])
  <span class="kw">else</span> <span class="kw">if</span>(<span class="ot">args</span>.<span class="fu">length</span> == <span class="dv">0</span>)
    <span class="kw">return</span> <span class="kw">this</span>.<span class="ot">vertices</span>.<span class="fu">slice</span>()                              <span class="co">// OPT: slice is costly</span>
  <span class="kw">else</span>
    <span class="kw">return</span> <span class="kw">this</span>.<span class="fu">findVerticesByIds</span>(args)
}</code></pre>

<p>This function receives its arguments as a list. If the first one is an object it passes it to <code>searchVertices</code>, allowing queries like:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">  <span class="ot">g</span>.<span class="fu">v</span>({<span class="dt">_id</span>:<span class="st">&#39;Thor&#39;</span>}).<span class="fu">run</span>()
  <span class="ot">g</span>.<span class="fu">v</span>({<span class="dt">species</span>: <span class="st">&#39;Aesir&#39;</span>}).<span class="fu">run</span>()</code></pre>

<p>Otherwise, if there are arguments it gets passed to <code>findVerticesByIds</code>, which handles queries like <code>g.v('Thor', 'Odin').run()</code>.</p>

<p>If there are no arguments at all, then our query looks like <code>g.v().run()</code>. This isn't something you'll want to do frequently with large graphs, especially since we're slicing the vertex list before returning it. We slice because some call sites manipulate the returned list directly by popping items off as they work through them. We could optimize this use case by cloning at the call site, or by avoiding those manipulations. (We could keep a counter in state instead of popping.)</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">findVerticesByIds</span> = <span class="kw">function</span>(ids) {
  <span class="kw">if</span>(<span class="ot">ids</span>.<span class="fu">length</span> == <span class="dv">1</span>) {
    <span class="kw">var</span> maybe_vertex = <span class="kw">this</span>.<span class="fu">findVertexById</span>(ids[<span class="dv">0</span>])            <span class="co">// maybe it&#39;s a vertex</span>
    <span class="kw">return</span> maybe_vertex ? [maybe_vertex] : []                 <span class="co">// or maybe it isn&#39;t</span>
  }

  <span class="kw">return</span> <span class="ot">ids</span>.<span class="fu">map</span>( <span class="kw">this</span>.<span class="ot">findVertexById</span>.<span class="fu">bind</span>(<span class="kw">this</span>) ).<span class="fu">filter</span>(Boolean)
}

<span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">findVertexById</span> = <span class="kw">function</span>(vertex_id) {
  <span class="kw">return</span> <span class="kw">this</span>.<span class="fu">vertexIndex</span>[vertex_id]
}</code></pre>

<p>Note the use of <code>vertexIndex</code> here. Without that index we'd have to go through each vertex in our list one at a time to decide if it matched the ID—turning a constant time operation into a linear time one, and any <span class="math">\(O(n)\)</span> operations that directly rely on it into <span class="math">\(O(n^2)\)</span> operations.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">searchVertices</span> = <span class="kw">function</span>(filter) {        <span class="co">// match on filter&#39;s properties</span>
  <span class="kw">return</span> <span class="kw">this</span>.<span class="ot">vertices</span>.<span class="fu">filter</span>(<span class="kw">function</span>(vertex) {
    <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">objectFilter</span>(vertex, filter)
  })
}</code></pre>

<p>The <code>searchVertices</code> function uses the <code>objectFilter</code> helper on every vertex in the graph. We'll look at <code>objectFilter</code> in the next section, but in the meantime, can you think of a way to search through the vertices lazily?</p>

<h4 id="filtering">Filtering</h4>

<p>We saw that <code>simpleTraversal</code> uses a filtering function on the edges it encounters. It's a simple function, but powerful enough for our purposes.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">filterEdges</span> = <span class="kw">function</span>(filter) {
  <span class="kw">return</span> <span class="kw">function</span>(edge) {
    <span class="kw">if</span>(!filter)                                 <span class="co">// no filter: everything is valid</span>
      <span class="kw">return</span> <span class="kw">true</span>

    <span class="kw">if</span>(<span class="kw">typeof</span> filter == <span class="st">&#39;string&#39;</span>)               <span class="co">// string filter: label must match</span>
      <span class="kw">return</span> <span class="ot">edge</span>.<span class="fu">_label</span> == filter

    <span class="kw">if</span>(<span class="ot">Array</span>.<span class="fu">isArray</span>(filter))                   <span class="co">// array filter: must contain label</span>
      <span class="kw">return</span> !!~<span class="ot">filter</span>.<span class="fu">indexOf</span>(<span class="ot">edge</span>.<span class="fu">_label</span>)

    <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">objectFilter</span>(edge, filter)    <span class="co">// object filter: check edge keys</span>
  }
}</code></pre>

<p>The first case is no filter at all: <code>g.v('Odin').in().run()</code> traverses all edges to Odin.</p>

<p>The second case filters on the edge's label: <code>g.v('Odin').in('parent').run()</code> traverses those edges with a label of 'parent'.</p>

<p>The third case accepts an array of labels: <code>g.v('Odin').in(['parent', 'spouse']).run()</code> traverses both parent and spouse edges.</p>

<p>And the fourth case uses the objectFilter function we saw before:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">objectFilter</span> = <span class="kw">function</span>(thing, filter) {
  <span class="kw">for</span>(<span class="kw">var</span> key <span class="kw">in</span> filter)
    <span class="kw">if</span>(thing[key] !== filter[key])
      <span class="kw">return</span> <span class="kw">false</span>

  <span class="kw">return</span> <span class="kw">true</span>
}</code></pre>

<p>This allows us to query the edge using a filter object:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">`<span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Odin&#39;</span>).<span class="fu">in</span>({<span class="dt">_label</span>: <span class="st">&#39;spouse&#39;</span>, <span class="dt">order</span>: <span class="dv">2</span>}).<span class="fu">run</span>()`    <span class="co">// finds Odin&#39;s second wife</span></code></pre>

<h2 id="the-interpreters-nature">The Interpreter's Nature</h2>

<p>We've arrived at the top of the narrative mountain, ready to receive our prize: the interpreter. The code is actually fairly compact, but the model has a bit of subtlety.</p>

<p>We compared programs to pipelines earlier, and that's a good mental model for writing queries. As we saw, though, we need a different model for the actual implementation. That model is more like a Turing machine than a pipeline: there's a read/write head that sits over a particular step. It &quot;reads&quot; the step, changes its &quot;state&quot;, and then moves either right or left.</p>

<p>Reading the step means evaluating the pipetype function. As we saw above, each of those functions accepts as input the entire graph, its own arguments, maybe a gremlin, and its own local state. As output it provides a gremlin, false, or a signal of 'pull' or 'done'. This output is what our quasi-Turing machine reads in order to change the machine's state.</p>

<p>That state comprises just two variables: one to record steps that are 'done', and another to record the <code>results</code> of the query. Those are updated, and then either the machine head moves or the query finishes and the result is returned.</p>

<p>We've now described all the state in our machine. We'll have a list of results that starts empty:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">  <span class="kw">var</span> results = []</code></pre>

<p>An index of the last 'done' step that starts behind the first step:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">  <span class="kw">var</span> done = -<span class="dv">1</span></code></pre>

<p>We need a place to store the most recent step's output, which might be a gremlin—or it might be nothing—so we'll call it <code>maybe_gremlin</code>:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">  <span class="kw">var</span> maybe_gremlin = <span class="kw">false</span></code></pre>

<p>And finally we'll need a program counter to indicate the position of the read/write head.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">  <span class="kw">var</span> pc = <span class="kw">this</span>.<span class="ot">program</span>.<span class="fu">length</span> - <span class="dv">1</span></code></pre>

<p>Except... wait a second. How are we going to get lazy <a href="#fn22" class="footnoteRef" id="fnref22"><sup>22</sup></a>? The traditional way of building a lazy system out of an eager one is to store parameters to function calls as &quot;thunks&quot; instead of evaluating them. You can think of a thunk as an unevaluated expression. In JS, which has first-class functions and closures, we can create a thunk by wrapping a function and its arguments in a new anonymous function which takes no arguments:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="kw">function</span> <span class="fu">sum</span>() {
  <span class="kw">return</span> [].<span class="ot">slice</span>.<span class="fu">call</span>(arguments).<span class="fu">reduce</span>(<span class="kw">function</span>(acc, n) { <span class="kw">return</span> acc + (n|<span class="dv">0</span>) }, <span class="dv">0</span>)
}

<span class="kw">function</span> <span class="fu">thunk_of_sum_1_2_3</span>() { <span class="kw">return</span> <span class="fu">sum</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>) }

<span class="kw">function</span> <span class="fu">thunker</span>(fun, args) {
  <span class="kw">return</span> <span class="kw">function</span>() {<span class="kw">return</span> <span class="ot">fun</span>.<span class="fu">apply</span>(fun, args)}
}

<span class="kw">function</span> <span class="fu">thunk_wrapper</span>(fun) {
  <span class="kw">return</span> <span class="kw">function</span>() {
    <span class="kw">return</span> <span class="ot">thunker</span>.<span class="fu">apply</span>(<span class="kw">null</span>, [fun].<span class="fu">concat</span>([[].<span class="ot">slice</span>.<span class="fu">call</span>(arguments)]))
  }
}

<span class="fu">sum</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>)              <span class="co">// -&gt; 6</span>
<span class="fu">thunk_of_sum_1_2_3</span>()      <span class="co">// -&gt; 6</span>
<span class="fu">thunker</span>(sum, [<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>])() <span class="co">// -&gt; 6</span>

<span class="kw">var</span> sum2 = <span class="fu">thunk_wrapper</span>(sum)
<span class="kw">var</span> thunk = <span class="fu">sum2</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>)
<span class="fu">thunk</span>()                   <span class="co">// -&gt; 6</span></code></pre>

<p>None of the thunks are invoked until one is actually needed, which usually implies some type of output is required: in our case the result of a query. Each time the interpreter encounters a new function call, we wrap it in a thunk. Recall our original formulation of a query: <code>children(children(children(parents(parents(parents([8]))))))</code>. Each of those layers would be a thunk, wrapped up like an onion.</p>

<p>There are a couple of tradeoffs with this approach: one is that spatial performance becomes more difficult to reason about, because of the potentially vast thunk graphs that can be created. Another is that our program is now expressed as a single thunk, and we can't do much with it at that point.</p>

<p>This second point isn't usually an issue, because of the phase separation between when our compiler runs its optimizations and when all the thunking occurs at runtime. In our case we don't have that advantage: because we're using method chaining to implement a fluent interface <a href="#fn23" class="footnoteRef" id="fnref23"><sup>23</sup></a> if we also use thunks to achieve laziness we would thunk each new method as it is called, which means by the time we get to <code>run()</code> we have only a thunk as our input, and no way to optimize our query.</p>

<p>Interestingly, our fluent interface hides another difference between our query language and regular programming languages. The query <code>g.v('Thor').in().out().run()</code> could be rewritten as <code>run(out(in(v(g, 'Thor'))))</code> if we weren't using method chaining. In JS we would first process <code>g</code> and <code>'Thor'</code>, then <code>v</code>, then <code>in</code>, <code>out</code> and <code>run</code>, working from the inside out. In a language with non-strict semantics we would work from the outside in, processing each consecutive nested layer of arguments only as needed.</p>

<p>So if we start evaluating our query at the end of the statement, with <code>run</code>, and work our way back to <code>v('Thor')</code>, calculating results only as needed, then we've effectively achieved non-strictness. The secret is in the linearity of our queries. Branches complicate the process graph and also introduce opportunities for duplicate calls, which require memoization to avoid wasted work. The simplicity of our query language means we can implement an equally simple interpreter based on our linear read/write head model.</p>

<p>In addition to allowing runtime optimizations, this style has many other benefits related to the ease of instrumentation: history, reversibility, stepwise debugging, query statistics. All these are easy to add dynamically because we control the interpreter and have left it as a virtual machine evaluator instead of reducing the program to a single thunk.</p>

<h2 id="interpreter-unveiled">Interpreter, Unveiled</h2>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">Q</span>.<span class="fu">run</span> = <span class="kw">function</span>() {                 <span class="co">// a machine for query processing</span>

  <span class="kw">var</span> max = <span class="kw">this</span>.<span class="ot">program</span>.<span class="fu">length</span> - <span class="dv">1</span>         <span class="co">// index of the last step in the program</span>
  <span class="kw">var</span> maybe_gremlin = <span class="kw">false</span>                 <span class="co">// a gremlin, a signal string, or false</span>
  <span class="kw">var</span> results = []                          <span class="co">// results for this particular run</span>
  <span class="kw">var</span> done = -<span class="dv">1</span>                             <span class="co">// behindwhich things have finished</span>
  <span class="kw">var</span> pc = max                              <span class="co">// our program counter</span>

  <span class="kw">var</span> step, state, pipetype

  <span class="kw">while</span>(done &lt; max) {
    <span class="kw">var</span> ts = <span class="kw">this</span>.<span class="fu">state</span>
    step = <span class="kw">this</span>.<span class="fu">program</span>[pc]                 <span class="co">// step is a pair of pipetype and args</span>
    state = (ts[pc] = ts[pc] || {})         <span class="co">// this step&#39;s state must be an object</span>
    pipetype = <span class="ot">Dagoba</span>.<span class="fu">getPipetype</span>(step[<span class="dv">0</span>])  <span class="co">// a pipetype is just a function</span></code></pre>

<p>Here <code>max</code> is just a constant, and <code>step</code>, <code>state</code>, and <code>pipetype</code> cache information about the current step. We've entered the driver loop, and we won't stop until the last step is done.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">    maybe_gremlin = <span class="fu">pipetype</span>(<span class="kw">this</span>.<span class="fu">graph</span>, step[<span class="dv">1</span>], maybe_gremlin, state)</code></pre>

<p>Calling the step's pipetype function with its arguments.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">    <span class="kw">if</span>(maybe_gremlin == <span class="st">&#39;pull&#39;</span>) {           <span class="co">// &#39;pull&#39; means the pipe wants more input</span>
      maybe_gremlin = <span class="kw">false</span>
      <span class="kw">if</span>(pc<span class="dv">-1</span> &gt; done) {
        pc--                                <span class="co">// try the previous pipe</span>
        <span class="kw">continue</span>
      } <span class="kw">else</span> {
        done = pc                           <span class="co">// previous pipe is done, so we are too</span>
      }
    }</code></pre>

<p>To handle the 'pull' case we first set <code>maybe_gremlin</code> <a href="#fn24" class="footnoteRef" id="fnref24"><sup>24</sup></a> to false. We're overloading our 'maybe' here by using it as a channel to pass the 'pull' and 'done' signals, but once one of those signals is sucked out we go back to thinking of this as a proper 'maybe'.</p>

<p>If the step before us isn't 'done' <a href="#fn25" class="footnoteRef" id="fnref25"><sup>25</sup></a> we'll move the head backward and try again. Otherwise, we mark ourselves as 'done' and let the head naturally fall forward.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">    <span class="kw">if</span>(maybe_gremlin == <span class="st">&#39;done&#39;</span>) {           <span class="co">// &#39;done&#39; tells us the pipe is finished</span>
      maybe_gremlin = <span class="kw">false</span>
      done = pc
    }</code></pre>

<p>Handling the 'done' case is even easier: set <code>maybe_gremlin</code> to false and mark this step as 'done'.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">    pc++                                    <span class="co">// move on to the next pipe</span>

    <span class="kw">if</span>(pc &gt; max) {
      <span class="kw">if</span>(maybe_gremlin)
        <span class="ot">results</span>.<span class="fu">push</span>(maybe_gremlin)         <span class="co">// a gremlin popped out of the pipeline</span>
      maybe_gremlin = <span class="kw">false</span>
      pc--                                  <span class="co">// take a step back</span>
    }
  }</code></pre>

<p>We're done with the current step, and we've moved the head to the next one. If we're at the end of the program and <code>maybe_gremlin</code> contains a gremlin, we'll add it to the results, set <code>maybe_gremlin</code> to false and move the head back to the last step in the program.</p>

<p>This is also the initialization state, since <code>pc</code> starts as <code>max</code>. So we start here and work our way back, and end up here again at least once for each final result the query returns.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">  results = <span class="ot">results</span>.<span class="fu">map</span>(<span class="kw">function</span>(gremlin) { <span class="co">// return projected results, or vertices</span>
    <span class="kw">return</span> <span class="ot">gremlin</span>.<span class="fu">result</span> != <span class="kw">null</span>
         ? <span class="ot">gremlin</span>.<span class="fu">result</span> : <span class="ot">gremlin</span>.<span class="fu">vertex</span> } )

  <span class="kw">return</span> results
}</code></pre>

<p>We're out of the driver loop now: the query has ended, the results are in, and we just need to process and return them. If any gremlin has its result set we'll return that, otherwise we'll return the gremlin's final vertex. Are there other things we might want to return? What are the tradeoffs here?</p>

<h2 id="query-transformers">Query Transformers</h2>

<p>Now we have a nice compact interpreter for our query programs, but we're still missing something. Every modern DBMS comes with a query optimizer as an essential part of the system. For non-relational databases, optimizing our query plan rarely yields the exponential speedups seen in their relational cousins <a href="#fn26" class="footnoteRef" id="fnref26"><sup>26</sup></a>, but it's still an important aspect of database design.</p>

<p>What's the simplest thing we could do that could reasonably be called a query optimizer? Well, we could write little functions for transforming our query programs before we run them. We'll pass a program in as input and get a different program back out as output.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">T</span> = []                               <span class="co">// transformers (more than meets the eye)</span>

<span class="ot">Dagoba</span>.<span class="fu">addTransformer</span> = <span class="kw">function</span>(fun, priority) {
  <span class="kw">if</span>(<span class="kw">typeof</span> fun != <span class="st">&#39;function&#39;</span>)
    <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">error</span>(<span class="st">&#39;Invalid transformer function&#39;</span>)

  <span class="kw">for</span>(<span class="kw">var</span> i = <span class="dv">0</span>; i &lt; <span class="ot">Dagoba</span>.<span class="ot">T</span>.<span class="fu">length</span>; i++)  <span class="co">// OPT: binary search</span>
    <span class="kw">if</span>(priority &gt; <span class="ot">Dagoba</span>.<span class="fu">T</span>[i].<span class="fu">priority</span>) <span class="kw">break</span>

  <span class="ot">Dagoba</span>.<span class="ot">T</span>.<span class="fu">splice</span>(i, <span class="dv">0</span>, {<span class="dt">priority</span>: priority, <span class="dt">fun</span>: fun})
}</code></pre>

<p>Now we can add query transformers to our system. A query transformer is a function that accepts a program and returns a program, plus a priority level. Higher priority transformers are placed closer to the front of the list. We're ensuring <code>fun</code> is a function, because we're going to evaluate it later <a href="#fn27" class="footnoteRef" id="fnref27"><sup>27</sup></a>.</p>

<p>We'll assume there won't be an enormous number of transformer additions, and walk the list linearly to add a new one. We'll leave a note in case this assumption turns out to be false—a binary search is much more time-optimal for long lists, but adds a little complexity and doesn't really speed up short lists.</p>

<p>To run these transformers we're going to inject a single line of code in to the top of our interpreter:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">Q</span>.<span class="fu">run</span> = <span class="kw">function</span>() {                     <span class="co">// our virtual machine for querying</span>
  <span class="kw">this</span>.<span class="fu">program</span> = <span class="ot">Dagoba</span>.<span class="fu">transform</span>(<span class="kw">this</span>.<span class="fu">program</span>) <span class="co">// activate the transformers</span></code></pre>

<p>We'll use that to call this function, which just passes our program through each transformer in turn:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">transform</span> = <span class="kw">function</span>(program) {
  <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="ot">T</span>.<span class="fu">reduce</span>(<span class="kw">function</span>(acc, transformer) {
    <span class="kw">return</span> <span class="ot">transformer</span>.<span class="fu">fun</span>(acc)
  }, program)
}</code></pre>

<p>Up until this point, our engine has traded simplicity for performance, but one of the nice things about this strategy is that it leaves doors open for global optimizations that may have been unavailable if we had opted to optimize locally as we designed the system.</p>

<p>Optimizing a program can often increase complexity and reduce the elegance of the system, making it harder to reason about and maintain. Breaking abstraction barriers for performance gains is one of the more egregious forms of optimization, but even something seemingly innocuous like embedding performance-oriented code into business logic makes maintenance more difficult.</p>

<p>In light of that, this type of &quot;orthogonal optimization&quot; is particularly appealing. We can add optimizers in modules or even user code, instead of having them tightly coupled to the engine. We can test them in isolation, or in groups, and with the addition of generative testing we could even automate that process, ensuring that our available optimizers play nicely together.</p>

<p>We can also use this transformer system to add new functionality unrelated to optimization. Let's look at a case of that now.</p>

<h2 id="aliases">Aliases</h2>

<p>Making a query like <code>g.v('Thor').out().in()</code> is quite compact, but is this Thor's siblings or his mates? Neither interpretation is fully satisfying. It'd be nicer to say what mean: either <code>g.v('Thor').parents().children()</code> or <code>g.v('Thor').children().parents()</code>.</p>

<p>We can use query transformers to make aliases with just a couple of extra helper functions:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addAlias</span> = <span class="kw">function</span>(newname, oldname, defaults) {
  defaults = defaults || []                     <span class="co">// default arguments for the alias</span>
  <span class="ot">Dagoba</span>.<span class="fu">addTransformer</span>(<span class="kw">function</span>(program) {
    <span class="kw">return</span> <span class="ot">program</span>.<span class="fu">map</span>(<span class="kw">function</span>(step) {
      <span class="kw">if</span>(step[<span class="dv">0</span>] != newname) <span class="kw">return</span> step
      <span class="kw">return</span> [oldname, <span class="ot">Dagoba</span>.<span class="fu">extend</span>(step[<span class="dv">1</span>], defaults)]
    })
    }, <span class="dv">100</span>)                                     <span class="co">// 100 because aliases run early</span>

  <span class="ot">Dagoba</span>.<span class="fu">addPipetype</span>(newname, <span class="kw">function</span>() {})
}</code></pre>

<p>We're adding a new name for an existing step, so we'll need to create a query transformer that converts the new name to the old name whenever it's encountered. We'll also need to add the new name as a method on the main query object, so it can be pulled into the query program.</p>

<p>If we could capture missing method calls and route them to a handler function then we might be able to run this transformer with a lower priority, but there's currently no way to do that. Instead we will run it with a high priority of 100 so the aliased methods are added before they are invoked.</p>

<p>We call another helper to merge the incoming step's arguments with the alias's default arguments. If the incoming step is missing an argument then we'll use the alias's argument for that slot.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">extend</span> = <span class="kw">function</span>(list, defaults) {
  <span class="kw">return</span> <span class="ot">Object</span>.<span class="fu">keys</span>(defaults).<span class="fu">reduce</span>(<span class="kw">function</span>(acc, key) {
    <span class="kw">if</span>(<span class="kw">typeof</span> list[key] != <span class="st">&#39;undefined&#39;</span>) <span class="kw">return</span> acc
    acc[key] = defaults[key]
    <span class="kw">return</span> acc
  }, list)
}</code></pre>

<p>Now we can make those aliases we wanted:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addAlias</span>(<span class="st">&#39;parents&#39;</span>, <span class="st">&#39;out&#39;</span>)
<span class="ot">Dagoba</span>.<span class="fu">addAlias</span>(<span class="st">&#39;children&#39;</span>, <span class="st">&#39;in&#39;</span>)</code></pre>

<p>We can also start to specialize our data model a little more, by labeling each edge between a parent and child as a 'parent' edge. Then our aliases would look like this:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addAlias</span>(<span class="st">&#39;parents&#39;</span>, <span class="st">&#39;out&#39;</span>, [<span class="st">&#39;parent&#39;</span>])
<span class="ot">Dagoba</span>.<span class="fu">addAlias</span>(<span class="st">&#39;children&#39;</span>, <span class="st">&#39;in&#39;</span>, [<span class="st">&#39;parent&#39;</span>])</code></pre>

<p>Now we can add edges for spouses, step-parents, or even jilted ex-lovers. If we enhance our <code>addAlias</code> function we can introduce new aliases for grandparents, siblings, or even cousins:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addAlias</span>(<span class="st">&#39;grandparents&#39;</span>, [ [<span class="st">&#39;out&#39;</span>, <span class="st">&#39;parent&#39;</span>], [<span class="st">&#39;out&#39;</span>, <span class="st">&#39;parent&#39;</span>]])
<span class="ot">Dagoba</span>.<span class="fu">addAlias</span>(<span class="st">&#39;siblings&#39;</span>,     [ [<span class="st">&#39;as&#39;</span>, <span class="st">&#39;me&#39;</span>], [<span class="st">&#39;out&#39;</span>, <span class="st">&#39;parent&#39;</span>]
                                , [<span class="st">&#39;in&#39;</span>, <span class="st">&#39;parent&#39;</span>], [<span class="st">&#39;except&#39;</span>, <span class="st">&#39;me&#39;</span>]])
<span class="ot">Dagoba</span>.<span class="fu">addAlias</span>(<span class="st">&#39;cousins&#39;</span>,      [ [<span class="st">&#39;out&#39;</span>, <span class="st">&#39;parent&#39;</span>], [<span class="st">&#39;as&#39;</span>, <span class="st">&#39;folks&#39;</span>]
                                , [<span class="st">&#39;out&#39;</span>, <span class="st">&#39;parent&#39;</span>], [<span class="st">&#39;in&#39;</span>, <span class="st">&#39;parent&#39;</span>]
                                , [<span class="st">&#39;except&#39;</span>, <span class="st">&#39;folks&#39;</span>], [<span class="st">&#39;in&#39;</span>, <span class="st">&#39;parent&#39;</span>]
                                , [<span class="st">&#39;unique&#39;</span>]])</code></pre>

<p>That <code>cousins</code> alias is kind of cumbersome. Maybe we could expand our <code>addAlias</code> function to allow ourselves to use other aliases in our aliases, and call it like this:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">addAlias</span>(<span class="st">&#39;cousins&#39;</span>,      [ <span class="st">&#39;parents&#39;</span>, [<span class="st">&#39;as&#39;</span>, <span class="st">&#39;folks&#39;</span>]
                                , <span class="st">&#39;parents&#39;</span>, <span class="st">&#39;children&#39;</span>
                                , [<span class="st">&#39;except&#39;</span>, <span class="st">&#39;folks&#39;</span>], <span class="st">&#39;children&#39;</span>, <span class="st">&#39;unique&#39;</span>])</code></pre>

<p>Now instead of</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Forseti&#39;</span>).<span class="fu">parents</span>().<span class="fu">as</span>(<span class="st">&#39;parents&#39;</span>).<span class="fu">parents</span>().<span class="fu">children</span>()
                        .<span class="fu">except</span>(<span class="st">&#39;parents&#39;</span>).<span class="fu">children</span>().<span class="fu">unique</span>()</code></pre>

<p>we can just say <code>g.v('Forseti').cousins()</code>.</p>

<p>We've introduced a bit of a pickle, though: while our <code>addAlias</code> function is resolving an alias it also has to resolve other aliases. What if <code>parents</code> called some other alias, and while we were resolving <code>cousins</code> we had to stop to resolve <code>parents</code> and then resolve its aliases and so on? What if one of <code>parents</code> aliases ultimately called <code>cousins</code>?</p>

<p>This brings us in to the realm of dependency resolution<a href="#fn28" class="footnoteRef" id="fnref28"><sup>28</sup></a>, a core component of modern package managers. There are a lot of fancy tricks for choosing ideal versions, tree shaking, general optimizations and the like, but the basic idea is fairly simple. We're going to make a graph of all the dependencies and their relationships, and then try to find a way to line up the vertices while making all the arrows go from left to right. If we can, then this particular sorting of the vertices is called a 'topological ordering', and we've proven that our dependency graph has no cycles: it is a Directed Acyclic Graph (DAG). If we fail to do so then our graph has at least one cycle.</p>

<p>On the other hand, we expect that our queries will generally be rather short (100 steps would be a very long query) and that we'll have a reasonably low number of transformers. Instead of fiddling around with DAGs and dependency management we could return 'true' from the transform function if anything changed, and then run it until it stops being productive. This requires each transformer to be idempotent, but that's a useful property for transformers to have. What are the pros and cons of these two pathways?</p>

<h2 id="performance">Performance</h2>

<p>All production graph databases share a particular performance characteristic: graph traversal queries are constant time with respect to total graph size <a href="#fn29" class="footnoteRef" id="fnref29"><sup>29</sup></a>. In a non-graph database, asking for the list of someone's friends can require time proportional to the number of entries, because in the naive worst-case you have to look at every entry. This means if a query over ten entries takes a millisecond, then a query over ten million entries will take almost two weeks. Your friend list would arrive faster if sent by Pony Express <a href="#fn30" class="footnoteRef" id="fnref30"><sup>30</sup></a>!</p>

<p>To alleviate this dismal performance most databases index over oft-queried fields, which turns an <span class="math">\(O(n)\)</span> search into an <span class="math">\(O(log n)\)</span> search. This gives considerably better search performance, but at the cost of some write performance and a lot of space—indices can easily double the size of a database. Careful balancing of the space/time tradeoffs of indices is part of the perpetual tuning process for most databases.</p>

<p>Graph databases sidestep this issue by making direct connections between vertices and edges, so graph traversals are just pointer jumps; no need to scan through every item, no need for indices, no extra work at all. Now finding your friends has the same price regardless of the total number of people in the graph, with no additional space cost or write time cost. One downside to this approach is that the pointers work best when the whole graph is in memory on the same machine. Effectively sharding a graph database across multiple machines is still an active area of research <a href="#fn31" class="footnoteRef" id="fnref31"><sup>31</sup></a>.</p>

<p>We can see this at work in the microcosm of Dagoba if we replace the functions for finding edges. Here's a naive version that searches through all the edges in linear time. It's similar to our very first implementation, but uses all the structures we've since built.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">findInEdges</span>  = <span class="kw">function</span>(vertex) {
  <span class="kw">return</span> <span class="kw">this</span>.<span class="ot">edges</span>.<span class="fu">filter</span>(<span class="kw">function</span>(edge) {<span class="kw">return</span> <span class="ot">edge</span>.<span class="ot">_in</span>.<span class="fu">_id</span>  == <span class="ot">vertex</span>.<span class="fu">_id</span>} )
}
<span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">findOutEdges</span> = <span class="kw">function</span>(vertex) {
  <span class="kw">return</span> <span class="kw">this</span>.<span class="ot">edges</span>.<span class="fu">filter</span>(<span class="kw">function</span>(edge) {<span class="kw">return</span> <span class="ot">edge</span>.<span class="ot">_out</span>.<span class="fu">_id</span> == <span class="ot">vertex</span>.<span class="fu">_id</span>} )
}</code></pre>

<p>We can add an index for edges, which gets us most of the way there with small graphs but has all the classic indexing issues for large ones.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">findInEdges</span>  = <span class="kw">function</span>(vertex) { <span class="kw">return</span> <span class="kw">this</span>.<span class="fu">inEdgeIndex</span> [<span class="ot">vertex</span>.<span class="fu">_id</span>] }
<span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">findOutEdges</span> = <span class="kw">function</span>(vertex) { <span class="kw">return</span> <span class="kw">this</span>.<span class="fu">outEdgeIndex</span>[<span class="ot">vertex</span>.<span class="fu">_id</span>] }</code></pre>

<p>And here we have our old friends back again: pure, sweet index-free adjacency.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">findInEdges</span>  = <span class="kw">function</span>(vertex) { <span class="kw">return</span> <span class="ot">vertex</span>.<span class="fu">_in</span>  }
<span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">findOutEdges</span> = <span class="kw">function</span>(vertex) { <span class="kw">return</span> <span class="ot">vertex</span>.<span class="fu">_out</span> }</code></pre>

<p>Run these yourself to experience the graph database difference <a href="#fn32" class="footnoteRef" id="fnref32"><sup>32</sup></a>.</p>

<h2 id="serialization">Serialization</h2>

<p>Having a graph in memory is great, but how do we get it there in the first place? We saw that our graph constructor can take a list of vertices and edges and create a graph for us, but once the graph has been built how do we get the vertices and edges back out?</p>

<p>Our natural inclination is to do something like <code>JSON.stringify(graph)</code>, which produces the terribly helpful error &quot;TypeError: Converting circular structure to JSON&quot;. During the graph construction process the vertices were linked to their edges, and the edges are all linked to their vertices, so now everything refers to everything else. So how can we extract our nice neat lists again? JSON replacer functions to the rescue.</p>

<p>The <code>JSON.stringify</code> function takes a value to stringify, but it also takes two additional parameters: a replacer function and a whitespace number <a href="#fn33" class="footnoteRef" id="fnref33"><sup>33</sup></a>. The replacer allows you to customize how the stringification proceeds.</p>

<p>We need to treat the vertices and edges a bit differently, so we're going to manually merge the two sides into a single JSON string.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">jsonify</span> = <span class="kw">function</span>(graph) {
  <span class="kw">return</span> <span class="st">&#39;{&quot;V&quot;:&#39;</span> + <span class="ot">JSON</span>.<span class="fu">stringify</span>(<span class="ot">graph</span>.<span class="fu">vertices</span>, <span class="ot">Dagoba</span>.<span class="fu">cleanVertex</span>)
       + <span class="st">&#39;,&quot;E&quot;:&#39;</span> + <span class="ot">JSON</span>.<span class="fu">stringify</span>(<span class="ot">graph</span>.<span class="fu">edges</span>,    <span class="ot">Dagoba</span>.<span class="fu">cleanEdge</span>)
       + <span class="st">&#39;}&#39;</span>
}</code></pre>

<p>And these are the replacers for vertices and edges.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">cleanVertex</span> = <span class="kw">function</span>(key, value) {
  <span class="kw">return</span> (key == <span class="st">&#39;_in&#39;</span> || key == <span class="st">&#39;_out&#39;</span>) ? <span class="kw">undefined</span> : value
}

<span class="ot">Dagoba</span>.<span class="fu">cleanEdge</span> = <span class="kw">function</span>(key, value) {
  <span class="kw">return</span> (key == <span class="st">&#39;_in&#39;</span> || key == <span class="st">&#39;_out&#39;</span>) ? <span class="ot">value</span>.<span class="fu">_id</span> : value
}</code></pre>

<p>The only difference between them is what they do when a cycle is about to be formed: for vertices, we skip the edge list entirely. For edges, we replace each vertex with its ID. That gets rid of all the cycles we created while building the graph.</p>

<p>We're manually manipulating JSON in <code>Dagoba.jsonify</code>, which generally isn't recommended as the JSON format is rather persnickety. Even in a dose this small it's easy to miss something and hard to visually confirm correctness.</p>

<p>We could merge the two replacer functions into a single function, and use that new replacer function over the whole graph by doing <code>JSON.stringify(graph, my_cool_replacer)</code>. This frees us from having to manually massage the JSON output, but the resulting code may be quite a bit messier. Try it yourself and see if you can come up with a well-factored solution that avoids hand-coded JSON. (Bonus points if it fits in a tweet.)</p>

<h2 id="persistence">Persistence</h2>

<p>Persistence is usually one of the trickier parts of a database: disks are relatively safe but slow. Batching writes, making them atomic, journaling—these are difficult to make both fast and correct.</p>

<p>Fortunately, we're building an <em>in-memory</em> database, so we don't have to worry about any of that! We may, though, occasionally want to save a copy of the database locally for fast restart on page load. We can use the serializer we just built to do exactly that. First let's wrap it in a helper function:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="ot">G</span>.<span class="fu">toString</span> = <span class="kw">function</span>() { <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">jsonify</span>(<span class="kw">this</span>) }</code></pre>

<p>In JavaScript an object's <code>toString</code> function is called whenever that object is coerced into a string. So if <code>g</code> is a graph, then <code>g+''</code> will be the graph's serialized JSON string.</p>

<p>The <code>fromString</code> function isn't part of the language specification, but it's handy to have around.</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">fromString</span> = <span class="kw">function</span>(str) {             <span class="co">// another graph constructor</span>
  <span class="kw">var</span> obj = <span class="ot">JSON</span>.<span class="fu">parse</span>(str)                     <span class="co">// this can throw</span>
  <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">graph</span>(<span class="ot">obj</span>.<span class="fu">V</span>, <span class="ot">obj</span>.<span class="fu">E</span>)
}</code></pre>

<p>Now we'll use those in our persistence functions. The <code>toString</code> function is hiding—can you spot it?</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">Dagoba</span>.<span class="fu">persist</span> = <span class="kw">function</span>(graph, name) {
  name = name || <span class="st">&#39;graph&#39;</span>
  <span class="ot">localStorage</span>.<span class="fu">setItem</span>(<span class="st">&#39;DAGOBA::&#39;</span>+name, graph)
}

<span class="ot">Dagoba</span>.<span class="fu">depersist</span> = <span class="kw">function</span> (name) {
  name = <span class="st">&#39;DAGOBA::&#39;</span> + (name || <span class="st">&#39;graph&#39;</span>)
  <span class="kw">var</span> flatgraph = <span class="ot">localStorage</span>.<span class="fu">getItem</span>(name)
  <span class="kw">return</span> <span class="ot">Dagoba</span>.<span class="fu">fromString</span>(flatgraph)
}</code></pre>

<p>We preface the name with a faux namespace to avoid polluting the <code>localStorage</code> properties of the domain, as it can get quite crowded in there. There's also usually a low storage limit, so for larger graphs we'd probably want to use a Blob of some sort.</p>

<p>There are also potential issues if multiple browser windows from the same domain are persisting and depersisting simultaneously. The <code>localStorage</code> space is shared between those windows, and they're potentially on different event loops, so there's the possibility of one carelessly overwriting the work of another. The spec says there should be a mutex required for read/write access to <code>localStorage</code>, but it's inconsistently implemented between different browsers, and even with it a simple implementation like ours could still encounter issues.</p>

<p>If we wanted our persistence implementation to be multi-window–concurrency aware, then we could make use of the storage events that are fired when <code>localStorage</code> is changed to update our local graph accordingly.</p>

<h2 id="updates">Updates</h2>

<p>Our <code>out</code> pipetype copies the vertex's out-going edges and pops one off each time it needs one. Building that new data structure takes time and space, and pushes more work on to the memory manager. We could have instead used the vertex's out-going edge list directly, keeping track of our place with a counter variable. Can you think of a problem with that approach?</p>

<p>If someone deletes an edge we've visited while we're in the middle of a query, that would change the size of our edge list, and we'd then skip an edge because our counter would be off. To solve this we could lock the vertices involved in our query, but then we'd either lose our capacity to regularly update the graph, or the ability to have long-lived query objects responding to requests for more results on-demand. Even though we're in a single-threaded event loop, our queries can span multiple asynchronous re-entries, which means concurrency concerns like this are a very real problem.</p>

<p>So we'll pay the performance price to copy the edge list. There's still a problem, though, in that long-lived queries may not see a completely consistent chronology. We will traverse every edge belonging to a vertex at the moment we visit it, but we visit vertices at different clock times during our query. Suppose we save a query like <code>var q = g.v('Odin').children().children().take(2)</code> and then call <code>q.run()</code> to gather two of Odin's grandchildren. Some time later we need to pull another two grandchildren, so we call <code>q.run()</code> again. If Odin has had a new grandchild in the intervening time, we may or may not see it, depending on whether the parent vertex was visited the first time we ran the query.</p>

<p>One way to fix this non-determinism is to change the update handlers to add versioning to the data. We'll then change the driver loop to pass the graph's current version in to the query, so we're always seeing a consistent view of the world as it existed when the query was first initialized. Adding versioning to our database also opens the door to true transactions, and automated rollback/retries in an STM-like fashion.</p>

<h2 id="future-directions">Future Directions</h2>

<p>We saw one way of gathering ancestors earlier:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">as</span>(<span class="st">&#39;parent&#39;</span>)
           .<span class="fu">out</span>().<span class="fu">as</span>(<span class="st">&#39;grandparent&#39;</span>)
           .<span class="fu">out</span>().<span class="fu">as</span>(<span class="st">&#39;great-grandparent&#39;</span>)
           .<span class="fu">merge</span>([<span class="st">&#39;parent&#39;</span>, <span class="st">&#39;grandparent&#39;</span>, <span class="st">&#39;great-grandparent&#39;</span>])
           .<span class="fu">run</span>()</code></pre>

<p>This is pretty clumsy, and doesn't scale well—what if we wanted six layers of ancestors? Or to look through an arbitrary number of ancestors until we found what we wanted?</p>

<p>It'd be nice if we could say something like this instead:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">all</span>().<span class="fu">times</span>(<span class="dv">3</span>).<span class="fu">run</span>()</code></pre>

<p>What we'd like to get out of this is something like the query above—maybe:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">as</span>(<span class="st">&#39;a&#39;</span>)
           .<span class="fu">out</span>().<span class="fu">as</span>(<span class="st">&#39;b&#39;</span>)
           .<span class="fu">out</span>().<span class="fu">as</span>(<span class="st">&#39;c&#39;</span>)
           .<span class="fu">merge</span>([<span class="st">&#39;a&#39;</span>, <span class="st">&#39;b&#39;</span>, <span class="st">&#39;c&#39;</span>])
           .<span class="fu">run</span>()`</code></pre>

<p>after the query transformers have all run. We could run the <code>times</code> transformer first, to produce:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript">    <span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Thor&#39;</span>).<span class="fu">out</span>().<span class="fu">all</span>().<span class="fu">out</span>().<span class="fu">all</span>().<span class="fu">out</span>().<span class="fu">all</span>().<span class="fu">run</span>()</code></pre>

<p>Then run the <code>all</code> transformer and have it transform each <code>all</code> into a uniquely labeled <code>as</code>, and put a <code>merge</code> after the last <code>as</code>.</p>

<p>There are a few problems with this, though. For one, this <code>as</code>/<code>merge</code> technique only works if every pathway is present in the graph: if we're missing an entry for one of Thor's great-grandparents then we will skip valid entries. For another, what happens if we want to do this to just part of a query and not the whole thing? What if there are multiple <code>all</code>s?</p>

<p>To solve that first problem we're going to have to treat <code>all</code>s as something more than just as/merge. We need each parent gremlin to actually skip the intervening steps. We can think of this as a kind of teleportation—jumping from one part of the pipeline directly to another—or we can think of it as a certain kind of branching pipeline, but either way it complicates our model somewhat. Another approach would be to think of the gremlin as passing through the intervening pipes in a sort of suspended animation, until awoken by a special pipe. Scoping the suspending/unsuspending pipes may be tricky, however.</p>

<p>The next two problems are easier. To modify just part of a query we'll wrap that portion in special start/end steps, like <code>g.v('Thor').out().start().in().out().end().times(4).run()</code>. Actually, if the interpreter knows about these special pipetypes we don't need the end step, because the end of a sequence is always a special pipetype. We'll call these special pipetypes &quot;adverbs&quot;, because they modify regular pipetypes like adverbs modify verbs.</p>

<p>To handle multiple <code>all</code>s we need to run all <code>all</code> transformers twice: once before <code>times</code>, to mark all <code>all</code>s uniquely, and again after <code>times</code> to re-mark all marked <code>all</code>s uniquely.</p>

<p>There's still the issue of searching through an unbounded number of ancestors—for example, how do we find out which of Ymir's descendants are scheduled to survive Ragnarök? We could make individual queries like <code>g.v('Ymir').in().filter({survives: true})</code> and <code>g.v('Ymir').in().in().in().in().filter({survives: true})</code>, and manually collect the results ourselves, but that's pretty awful.</p>

<p>We'd like to use an adverb like this:</p>

<pre class="sourceCode javascript"><code class="sourceCode javascript"><span class="ot">g</span>.<span class="fu">v</span>(<span class="st">&#39;Ymir&#39;</span>).<span class="fu">in</span>().<span class="fu">filter</span>({<span class="dt">survives</span>: <span class="kw">true</span>}).<span class="fu">every</span>()</code></pre>

<p>which would work like <code>all</code>+<code>times</code> but without enforcing a limit. We may want to impose a particular strategy on the traversal, though, like a stolid BFS or YOLO DFS, so <code>g.v('Ymir').in().filter({survives: true}).bfs()</code> would be more flexible. Phrasing it this way allows us to state complicated queries like &quot;check for Ragnarök survivors, skipping every other generation&quot; in a straightforward fashion: <code>g.v('Ymir').in().filter({survives: true}).in().bfs()</code>.</p>

<h2 id="wrapping-up">Wrapping Up</h2>

<p>So what have we learned? Graph databases are great for storing interconnected <a href="#fn34" class="footnoteRef" id="fnref34"><sup>34</sup></a> data that you plan to query via graph traversals. Adding non-strict semantics allows for a fluent interface over queries you could never express in an eager system for performance reasons, and allows you to cross async boundaries. Time makes things complicated, and time from multiple perspectives (i.e., concurrency) makes things very complicated, so whenever we can avoid introducing a temporal dependency (e.g., state, observable effects, etc.) we make reasoning about our system easier. Building in a simple, decoupled and painfully unoptimized style leaves the door open for global optimizations later on, and using a driver loop allows for orthogonal optimizations—each without introducing the brittleness and complexity that is the hallmark of most optimization techniques.</p>

<p>That last point can't be overstated: keep it simple. Eschew optimization in favor of simplicity. Work hard to achieve simplicity by finding the right model. Explore many possibilities. The chapters in this book provide ample evidence that highly non-trivial applications can have a small, tight kernel. Once you find that kernel for the application you are building, fight to keep complexity from polluting it. Build hooks for attaching additional functionality, and maintain your abstraction barriers at all costs. Using these techniques well is not easy, but they can give you leverage over otherwise intractable problems.</p>

<h3 id="acknowledgements">Acknowledgements</h3>

<p>Many thanks are due to Amy Brown, Michael DiBernardo, Colin Lupton, Scott Rostrup, Michael Russo, Erin Toliver, and Leo Zovic for their invaluable contributions to this chapter.</p>

<div class="footnotes">
<ol>
<li id="fn1"><p>One of the very first database designs was the hierarchical model, which grouped items into tree-shaped hierarchies and is still used as the basis of IBM's IMS product, a high-speed transaction processing system. It's influence can also been seen in XML, file systems and geographic information storage. The network model, invented by Charles Bachmann and standardized by CODASYL, generalized the hierarchical model by allowing multiple parents, forming a DAG instead of a tree. These navigational database models came in to vogue in the 1960s and continued their dominance until performance gains made relational databases usable in the 1980s.<a href="#fnref1">↩</a></p></li>
<li id="fn2"><p>Edgar F. Codd developed relational database theory while working at IBM, but Big Blue feared that a relational database would cannibalize the sales of IMS. While IBM eventually built a research prototype called System R, it was based around a new non-relational language called SEQUEL, instead of Codd's original Alpha language. The SEQUEL language was copied by Larry Ellison in his Oracle Database based on pre-launch conference papers, and the name changed to SQL to avoid trademark disputes.<a href="#fnref2">↩</a></p></li>
<li id="fn3"><p>This database started life as a library for managing Directed Acyclic Graphs, or DAGs. Its name &quot;Dagoba&quot; was originally intended to come with a silent 'h' at the end, an homage to the swampy fictional planet, but reading the back of a chocolate bar one day we discovered the sans-h version refers to a place for silently contemplating the connections between things, which seems even more fitting.<a href="#fnref3">↩</a></p></li>
<li id="fn4"><p>The two purposes of this chapter are to teach this process, to build a graph database, and to have fun.<a href="#fnref4">↩</a></p></li>
<li id="fn5"><p>Notice that we're modeling edges as a pair of vertices. Also notice that those pairs are ordered, because we're using arrays. That means we're modeling a <em>directed graph</em>, where every edge has a starting vertex and an ending vertex. Our &quot;dots and lines&quot; visual model becomes a &quot;dots and arrows&quot; model. This adds complexity to our model, because we have to keep track of the direction of edges, but it also allows us to ask more interesting questions, like &quot;which vertices point to vertex 3?&quot; or &quot;which vertex has the most outgoing edges?&quot; If we need to model an undirected graph we could add a reversed edge for each existing edge in our directed graph. It can be cumbersome to go the other direction: simulating a directed graph from an undirected one. Can you think of a way to do it?<a href="#fnref5">↩</a></p></li>
<li id="fn6"><p>It's also lax in the other direction: all functions are variadic, and all arguments are available by position via the <code>arguments</code> object, which is almost like an array but not quite. (&quot;Variadic&quot; is a fancy way of saying a function has indefinite arity. &quot;A function has indefinite arity&quot; is a fancy way of saying it takes a variable number of variables.)<a href="#fnref6">↩</a></p></li>
<li id="fn7"><p>The <code>Array.isArray</code> checks here are to distinguish our two different use cases, but in general we won't be doing many of the validations one would expect of production code, in order to focus on the architecture instead of the trash bins.<a href="#fnref7">↩</a></p></li>
<li id="fn8"><p>Why can't we just use <code>this.vertices.length</code> here?<a href="#fnref8">↩</a></p></li>
<li id="fn9"><p>Often when faced with space leaks due to deep copying the solution is to use a path-copying persistent data structure, which allows mutation-free changes for only <span class="math">\(\log{}N\)</span> extra space. But the problem remains: if the host application retains a pointer to the vertex data then it can mutate that data any time, regardless of what strictures we impose in our database. The only practical solution is deep copying vertices, which doubles our space usage. Dagoba's original use case involves vertices that are treated as immutable by the host application, which allows us to avoid this issue, but requires a certain amount of discipline on the part of the user.<a href="#fnref9">↩</a></p></li>
<li id="fn10"><p>We could make this decision based on a Dagoba-level configuration parameter, a graph-specific configuration, or possibly some type of heuristic.<a href="#fnref10">↩</a></p></li>
<li id="fn11"><p>We use the term <em>list</em> to refer to the abstract data structure requiring push and iterate operations. We use JavaScript's &quot;array&quot; concrete data structure to fulfill the API required by the list abstraction. Technically both &quot;list of edges&quot; and &quot;array of edges&quot; are correct, so which we use at a given moment depends on context: if we are relying on the specific details of JavaScript arrays, like the <code>.length</code> property, we will say &quot;array of edges&quot;. Otherwise we say &quot;list of edges&quot;, as an indication that any list implementation would suffice.<a href="#fnref11">↩</a></p></li>
<li id="fn12"><p>A tuple is another abstract data structure—one that is more constrained than a list. In particular a tuple has a fixed size: in this case we're using a 2-tuple (also known as a &quot;pair&quot; in the technical jargon of data structure researchers). Using the term for the most constrained abstract data structure required is a nicety for future implementors.<a href="#fnref12">↩</a></p></li>
<li id="fn13"><p>Very short lived garbage though, which is the second best kind.<a href="#fnref13">↩</a></p></li>
<li id="fn14"><p>Two references to the same mutable data structure act like a pair of walkie-talkies, allowing whoever holds them to communicate directly. Those walkie-talkies can be passed around from function to function, and cloned to create a whole lot of walkie-talkies. This completely subverts the natural communication channels your code already possesses. In a system with no concurrency you can sometimes get away with it, but introduce multithreading or asynchronous behavior and all that walkie-talkie squawking can become a real drag.<a href="#fnref14">↩</a></p></li>
<li id="fn15"><p>Uniqueness types were dusted off in the Clean language, and have a non-linear relationship with linear types, which are themselves a subtype of substructural types.<a href="#fnref15">↩</a></p></li>
<li id="fn16"><p>Most modern JS runtimes employ generational garbage collectors, and the language is intentionally kept at arm's length from the engine's memory management to curtail a source of programmatic non-determinism.<a href="#fnref16">↩</a></p></li>
<li id="fn17"><p>The <code>run()</code> at the end of the query invokes the interpreter and returns results.<a href="#fnref17">↩</a></p></li>
<li id="fn18"><p>With weight in skippund and height in fathoms, naturally. Depending on the density of Asgardian flesh this may return many results, or none at all. (Or just Volstagg, if we're allowing Shakespeare by way of Jack Kirby into our pantheon.)<a href="#fnref18">↩</a></p></li>
<li id="fn19"><p>Some would argue it's best to be explicit all the time. Others would argue that a good system for implicits makes for more concise, readable code, with less boilerplate and a smaller surface area for bugs. One thing we can all agree on is that making effective use of JavaScript's implicit coercion requires memorizing a lot of non-intuitive special cases, making it a minefield for the uninitiated.<a href="#fnref19">↩</a></p></li>
<li id="fn20"><p>What would you expect each of those to return? What do they actually return?<a href="#fnref20">↩</a></p></li>
<li id="fn21"><p>There are certain conditions under which this particular query might yield unexpected results. Can you think of any? How could you modify it to handle those cases?<a href="#fnref21">↩</a></p></li>
<li id="fn22"><p>Technically we need to implement an interpreter with non-strict semantics, which means it will only evaluate when forced to do so. Lazy evaluation is a technique used for implementing non-strictness. It's a bit lazy of us to conflate the two, so we will only disambiguate when forced to do so.<a href="#fnref22">↩</a></p></li>
<li id="fn23"><p>Method chaining lets us write <code>g.v('Thor').in().out().run()</code> instead of the six lines of non-fluent JS required to accomplish the same thing.<a href="#fnref23">↩</a></p></li>
<li id="fn24"><p>We call it <code>maybe_gremlin</code> to remind ourselves that it could be a gremlin, or it could be something else. Also because originally it was either a gremlin or Nothing.<a href="#fnref24">↩</a></p></li>
<li id="fn25"><p>Recall that done starts at -1, so the first step's predecessor is always done.<a href="#fnref25">↩</a></p></li>
<li id="fn26"><p>Or, more pointedly, a poorly phrased query is less likely to yield exponential slowdowns. As an end-user of an RDBMS the aesthetics of query quality can often be quite opaque.<a href="#fnref26">↩</a></p></li>
<li id="fn27"><p>Note that we're keeping the domain of the priority parameter open, so it can be an integer, a rational, a negative number, or even things like Infinity or NaN.<a href="#fnref27">↩</a></p></li>
<li id="fn28"><p>You can learn more about dependency resolution in the Contingent chapter of this book.<a href="#fnref28">↩</a></p></li>
<li id="fn29"><p>The fancy term for this is &quot;index-free adjacency&quot;.<a href="#fnref29">↩</a></p></li>
<li id="fn30"><p>Though only in operation for 18 months due to the arrival of the transcontinental telegraph and the outbreak of the American Civil War, the Pony Express is still remembered today for delivering mail coast to coast in just ten days.<a href="#fnref30">↩</a></p></li>
<li id="fn31"><p>Sharding a graph database requires partitioning the graph. <a href="http://dl.acm.org/citation.cfm?doid=1007912.1007931">Optimal graph partitioning is NP-hard</a>, even for simple graphs like trees and grids, and good approximations also have exponential <a href="http://arxiv.org/pdf/1311.3144v2.pdf">asymptotic complexity</a>.<a href="#fnref31">↩</a></p></li>
<li id="fn32"><p>In modern JavaScript engines filtering a list is quite fast—for small graphs the naive version can actually be faster than the index-free version due to the underlying data structures and the way the code is JIT compiled. Try it with different sizes of graphs to see how the two approaches scale.<a href="#fnref32">↩</a></p></li>
<li id="fn33"><p>Pro tip: Given a deep tree <code>deep_tree</code>, running <code>JSON.stringify(deep_tree, 0, 2)</code> in the JS console is a quick way to make it human readable.<a href="#fnref33">↩</a></p></li>
<li id="fn34"><p>Not <em>too</em> interconnected, though—you'd like the number of edges to grow in direct proportion to the number of vertices. In other words, the average number of edges connected to a vertex shouldn't vary with the size of the graph. Most systems we'd consider putting in a graph database already have this property: if Loki had 100,000 additional grandchildren the degree of the Thor vertex wouldn't increase.<a href="#fnref34">↩</a></p></li>
</ol>
</div>
  </body>
</html>
